Voice Agents Can Just Do Things

SessionEngineering trackconfirmed

Voice Agents Can Just Do Things

Day: Day 2 — Session Day 1
Time: 11:40am-12:00pm
Room: Track 6
Track: Voice & Realtime AI

Accessible with the Engineering pass and above.

About this session

Too many voice AI integrations still treat speech as fancier chat: audio in, audio out. But we're at a point where speech can be a control plane for software, and most developers are unaware that voice has become a capability overhang. Current realtime models can understand intent, call tools, speak while work is underway, recover from corrections, and decide what the user actually needs to hear. As a result, we're seeing three practical patterns emerge: voice-to-action, systems-to-voice, and voice-to-voice. We’ll show how each pattern changes the architecture, where Realtime 2’s reasoning and tool-calling matter, and why chained STT / LLM / TTS systems start to break down as the interaction patterns become richer.

Topics

Voice

Speaker

Charlie Guo

Developer Experience · OpenAI