Accessible with the Engineering pass and above.
Training harnesses are data generators: when environments are flaky, stale, reward-hacked, or mismatched to production, models learn the wrong behavior. This talk distills common RL environment failures across coding, SaaS, and support-agent workflows, then offers a practical framework for building production-grade harnesses with clean signal, realistic state, fail-fast behavior, and trustworthy rewards.