Accessible with the Expo Explorer pass and above.
Description An agent that sifts through your experiments, visualizes results, launches training jobs, and tries again — autonomously. We'll demo it live in W&B Models and show the eval loop in W&B Weave that makes it trustworthy. Abstract The holy grail of agentic AI tooling is the autoresearch loop: an agent that can sift through your experiments, create visualizations, propose a hypothesis, launch a training job, read the results, and try again autonomously. In this workshop, we'll show new autoresearch capabilities built directly into the W&B Models web and iOS apps, demoed live using a real-world fine-tuning project. You'll watch natural language turn into a fine-tuned model: launching jobs, reading loss curves, surfacing the outlier runs that eat researcher hours, and recommending what to try next. Then we get hands-on with the eval-driven development loop in W&B Weave that makes agents like this trustworthy. You'll see how production traces become benchmarks, and how only the agents that beat the bar make it to production. This is the same loop we use to improve our own agentic features. Speakers Tim Sweeney is a Principal Software Engineer at Weights & Biases, where he works on new, cross-team special projects. He previously led the ML Platform as Product Manager at Twitter and built enterprise search products at Workday. He holds an M.S. in Computer Science from Georgia Tech. Zubin Aysola is an ML Researcher at Weights & Biases, where he leads the autoresearch effort for agentic features. Previously, he's led AI efforts at Vijil and Solvvy Inc focusing on novel uses of LLMs. He recently finished his Masters in AI from Carnegie Mellon, focusing on LLM Safety and Agent Evaluation. If you're training machine learning models or building agents, this is the iteration loop you want. Join us.