Accessible with the Leadership (All-Access) pass and above.
Many LLM deployment conversations focus on models, benchmarks, and prompting, but the hardest problems actually start after the model works. In this session, Senior Director of AI Software at NVIDIA and former CEO of CentML Gennady Pekhimenko and Gradient General Partner Zach Bratun-Glennon will explore the details and cutting edge of inference performance. They'll unpack what actually happens when you try to run large models in production, including lessons and patterns observed from real deployments, and what the next generation of compilers, frameworks and platform acceleration should look like to enable successful AI workloads.