TCP and RDMA are Killing Inference Throughput; Homa can Fix It

KeynoteKeynote trackconfirmed

TCP and RDMA are Killing Inference Throughput; Homa can Fix It

Day: Day 4 — Session Day 3
Time: 9:50am-10:10am
Room: Main Stage
Track: Software Factories

Accessible with the Engineering pass and above.

About this session

Modern AI inferencing is shifting from monolithic requests to complex agentic workflows and disaggregated KV stores. As a result, AI network traffic is no longer just very large transfers; tiny metadata requests are becoming more and more common, and their latency has a critical impact on throughput. Unfortunately, legacy transport protocols such as TCP and RDMA perform poorly on these workloads due to poor congestion control and head-of-line blocking. This talk will discuss the problems with TCP and RDMA and provide a brief introduction to the Homa transport protocol. Homa uses receiver-driven flow control and capitalizes on priority queues in network switches to reduce short-message latency by 10x for workloads like those in AI datacenters.

Topics

LLM Production Infra

Speaker

John Ousterhout

Bosack Lerner Professor of Computer Science / Professor Emeritus · Stanford University