Benchmarking Coding Agents on New vs Legacy Code bases

SessionEngineering trackconfirmed

Benchmarking Coding Agents on New vs Legacy Code bases

Day
Day 4 — Session Day 3
Time
12:05pm-12:25pm
Room
Track 8
Track
Agentic Engineering

Accessible with the Engineering pass and above.

About this session

You have an old code base with 100,000s of lines of code, should you let an AI Agent refactor or do you wait until you have a cleaner setup? Last year we refactored a number of code bases and ran evaluations on how well different models, harnesses and rule sets affected multiple versions of the code base. This talk will feature specific code examples as well as a broader set of evals.

Topics

Evals & ObservabilityCoding AgentsAI Architects

Speaker