TCP and RDMA are Killing Inference Throughput; Homa can Fix It

KeynoteKeynote trackconfirmed

TCP and RDMA are Killing Inference Throughput; Homa can Fix It

Day
Day 4 — Session Day 3
Time
9:50am-10:10am
Room
Main Stage
Track
Software Factories

Accessible with the Engineering pass and above.

About this session

Modern AI inferencing is shifting from monolithic requests to complex agentic workflows and disaggregated KV stores. As a result, AI network traffic is no longer just very large transfers; tiny metadata requests are becoming more and more common, and their latency has a critical impact on throughput. Unfortunately, legacy transport protocols such as TCP and RDMA perform poorly on these workloads due to poor congestion control and head-of-line blocking. This talk will discuss the problems with TCP and RDMA and provide a brief introduction to the Homa transport protocol. Homa uses receiver-driven flow control and capitalizes on priority queues in network switches to reduce short-message latency by 10x for workloads like those in AI datacenters.

Topics

LLM Production Infra

Speaker