Why your LLM is slow and expensive: lessons learned from running models in production

SessionLeadership trackconfirmed

Why your LLM is slow and expensive: lessons learned from running models in production

Day: Day 4 — Session Day 3
Time: 1:55pm-2:15pm
Room: Leadership 2
Track: AI Architects: AI Factories

Accessible with the Leadership (All-Access) pass and above.

About this session

Many LLM deployment conversations focus on models, benchmarks, and prompting, but the hardest problems actually start after the model works. In this session, Senior Director of AI Software at NVIDIA and former CEO of CentML Gennady Pekhimenko and Gradient General Partner Zach Bratun-Glennon will explore the details and cutting edge of inference performance. They'll unpack what actually happens when you try to run large models in production, including lessons and patterns observed from real deployments, and what the next generation of compilers, frameworks and platform acceleration should look like to enable successful AI workloads.

Topics

LLM Production InfraAI in Enterprise/Fortune 500

Speaker

Zach Bratun-Glennon

General Partner · Gradient