Accessible with the Engineering pass and above.
A healthcare voice agent can be right on the benchmark and still fail in production. Real patients hesitate, interrupt, misremember medications, code-switch mid-sentence, and disclose risk indirectly. After **200M+ patient-agent interactions**, the lesson is clear: in clinical voice AI, interaction is a safety variable. This talk breaks down what Hippocratic AI had to rebuild beyond the generic voice stack: not just ASR, VAD, an LLM, TTS, and turn-taking heuristics, but a real-time safety system that treats silence, clarification, escalation, multilingual continuity, and medication-specific recognition as first-class engineering problems. We’ll walk through the production architecture behind Hippocratic AI’s voice agents: a **30+ model supervisor constellation**, including the **4.1T-parameter AI Front Door system**, designed to catch failures a single primary model misses. The talk covers how specialized models monitor medication identification, overdose risk, labs and vitals, escalation criteria, workflow confirmation, and other clinical safety surfaces while the patient conversation is still happening. We’ll focus on four production lessons: - **Benchmarks are not enough:** MedQA and USMLE-style accuracy do not capture the failure modes that appear in a 12-minute, multi-turn patient call. - **Interaction signals become training data:** pauses, interruptions, hesitation, clarification requests, and escalation markers are mined from production calls and turned into structured eval and training signals. - **One LLM is not a safety architecture:** supervisor models can overrule, block, or escalate when the primary model sounds plausible but misses a clinical risk. - **Voice infrastructure has clinical failure modes:** domain ASR, medication vocabulary, code-switching, latency, and turn-taking all affect whether the system makes the right next move.