"My name is... my name is...": A Linguistic Framework for Debugging Voice AI Failures

SessionEngineering trackconfirmed

"My name is... my name is...": A Linguistic Framework for Debugging Voice AI Failures

Day: Day 2 — Session Day 1
Time: 3:20pm-3:40pm
Room: Track 6
Track: Voice & Realtime AI

Accessible with the Engineering pass and above.

About this session

Every voice AI engineer has heard it: a caller repeating their name three times, getting more frustrated with each attempt. The logs look clean. Confidence scores look fine. Linguistics can help solving the mystery. By the end of this talk, you'll have a diagnostic framework for the failures that slip past standard metrics, a way to turn "the agent just didn't get it" into concrete, debuggable failure modes. The framework maps three levels of linguistic structure (sounds, words, and interactions) against the two dimensions every voice agent engineer already works in: what we hear (speech recognition) and what we speak (speech synthesis). That 3×2 grid surfaces problems your current tooling can't see, including: 1. Why your user cannot make your system understand their name 2. Why a single well-intentioned vocabulary hint can cause catastrophic drops in a non-English language 3. Why a transcript that's "cumulatively correct" can still ruin the user experience Drawing on examples from production multilingual voice AI work, I'll show where linguistic expertise connects to the engineering decisions you're already making and where it reveals failure modes that confidence scores will never warn you about. Who this is for: Voice AI engineers, ML practitioners on Voice AI pipelines, and anyone who's watched clean logs while their agent quietly fails real users.

Topics

VoiceAI Platform Engineering

Speaker

Midam Kim

Senior Linguist and ML Engineer · ServiceNow