Your Agent Evolved. Your Evals Didn't.

SessionLeadership trackconfirmed

Your Agent Evolved. Your Evals Didn't.

Day: Day 2 — Session Day 1
Time: 11:10am-11:30am
Room: Leadership 2
Track: AI Architects: Show my Workflow

Accessible with the Leadership (All-Access) pass and above.

About this session

Knowing which generation your agent is in, which failure modes your current evals are blind to, and what to build next is the difference between shipping with confidence and flying blind. Agent architectures have evolved through six generations; prompt, chain, ReAct loop, workflow graph, modern agent loop, AI harness. And each one quietly breaks the eval strategy of the generation before it. A prompt-quality rubric won't catch a bad tool call; a trace scorer won't catch memory poisoning. Using a single SRE incident response agent threaded through every generation, this talk shows exactly where each architecture outgrows its evals and what you need to close the gap.

Speaker

Ameya Bhatawdekar

Braintrust