All the Things We Have to Do to Satisfy Your Insatiable Need for Tokens

SessionLeadership trackconfirmed

All the Things We Have to Do to Satisfy Your Insatiable Need for Tokens

Day
Day 4 — Session Day 3
Time
11:40am-12:00pm
Room
Leadership 1
Track
Inference

Accessible with the Leadership (All-Access) pass and above.

About this session

Every time the industry figures out how to serve tokens faster and cheaper, the appetite grows to match. Models get bigger, contexts get longer, agents start chaining thousands of calls together. The finish line keeps moving. This talk is a technical tour through everything the industry has done to keep up, led by two experts in high-performance inference. We'll start with the optimizations that made hardware work harder without changing the underlying architecture. Then we'll go up a level with techniques that work smarter across requests and across the model itself. And finally, a peek into the future with heterogeneous disaggregated inference, the architectural shift that splits prefill and decode across specialized hardware, and even more advanced forms of hardware specialization coming your way soon. Token demand is about to get a lot more insatiable. Let's see what the future has in store for us!

Topics

LLM Production InfraAI Architects

Speakers