AI Benchmarks for Vertical Industries: Why we're not measuring what we need to and how to unlock real-world ROI

SessionEngineering trackconfirmed

AI Benchmarks for Vertical Industries: Why we're not measuring what we need to and how to unlock real-world ROI

Day: Day 4 — Session Day 3
Time: 3:45pm-4:05pm
Room: Track 7
Track: AI in Healthcare

Accessible with the Engineering pass and above.

About this session

AI is acing the tests we set for it. So why are so many production deployments falling flat? Talk to any team shipping vertical AI and you'll hear the same story: impressive benchmark scores, underwhelming real-world results. The problem isn't that we lack benchmarks; it's that we're measuring the wrong things. Synthetic tasks and standardised datasets tell you how a model performs in a lab. They don't tell you whether it's working in your product, for your customers, on your edge cases. The gap between benchmark-ready and production-ready is where ROI goes to die. This talk draws on lessons from building Anterior's internal benchmark for real-world healthcare tasks; work that now helps health insurance providers make decisions covering 50 million American lives. I'll share how to bring in domain experts to translate real-world performance into concrete measurement rubrics, how to make imperfect synthetic data actually work in practice, the most common pitfalls teams fall into, and how to apply all of this to any vertical domain.

Topics

Evals & ObservabilityAI in HealthcareAI in Law

Speaker

Christopher Lovejoy

Member of Technical Staff · Anthropic