The Frontier AI Inference Cloud for Agents

SessionEngineering trackconfirmed

The Frontier AI Inference Cloud for Agents

Day
Day 4 — Session Day 3
Time
3:45pm-4:05pm
Room
Track 9
Track
Inference

Accessible with the Engineering pass and above.

About this session

Agents have changed the economics of AI inference. A chatbot’s cost scales roughly linearly with the number of requests; an agent’s scales multiplicatively. A single task can fan out into hundreds of model calls, each carrying a repeated context prefix and adding latency that compounds across tool calls and reasoning steps. As open-weight models keep improving and agentic workloads grow, this shift exposes the limits of traditional request-level optimization. Inference infrastructure becomes a first-class concern, one that often shapes performance and cost as much as the model itself. In this talk, we explore what changes when you optimize for the whole task rather than the individual request, and how FriendliAI is rethinking the inference cloud for the era of agentic AI.

Speaker