All the Things We Have to Do to Satisfy Your Insatiable Need for Tokens

SessionLeadership trackconfirmed

All the Things We Have to Do to Satisfy Your Insatiable Need for Tokens

Day: Day 4 — Session Day 3
Time: 11:40am-12:00pm
Room: Leadership 1
Track: Inference

Accessible with the Leadership (All-Access) pass and above.

About this session

Every time the industry figures out how to serve tokens faster and cheaper, the appetite grows to match. Models get bigger, contexts get longer, agents start chaining thousands of calls together. The finish line keeps moving. This talk is a technical tour through everything the industry has done to keep up, led by two experts in high-performance inference. We'll start with the optimizations that made hardware work harder without changing the underlying architecture. Then we'll go up a level with techniques that work smarter across requests and across the model itself. And finally, a peek into the future with heterogeneous disaggregated inference, the architectural shift that splits prefill and decode across specialized hardware, and even more advanced forms of hardware specialization coming your way soon. Token demand is about to get a lot more insatiable. Let's see what the future has in store for us!

Topics

LLM Production InfraAI Architects

Speakers

Daniel Kim

Head of Growth · Cerebras Systems

Natalie Serrino

Gimlet Labs