What's New in Inference Engineering

SessionEngineering trackconfirmed

What's New in Inference Engineering

Day: Day 4 — Session Day 3
Time: 1:30pm-1:50pm
Room: Track 9
Track: Inference

Accessible with the Engineering pass and above.

About this session

More than 30,000 engineers have learned the fundamentals of inference since Inference Engineering was published. But the field keeps accelerating, so it's time for the first public addendum to the book. The past four months have seen a renewed focus on training-dependent inference optimization across the "big three" performance techniques of speculation, caching, and quantization. This talk provides structured guidance for training DFlash and EAGLE 3 draft models to accelerate LLM decode, introduces the concept of KV compaction, and explains the hype behind TurboQuant.

Topics

Inference (vLLM, SGLang, etc)

Speaker

Philip Kiely

Developer Relations · Baseten