Serving 2 Million Models Without Melting: Scaling the Hugging Face Hub

SessionLeadership trackconfirmed

Serving 2 Million Models Without Melting: Scaling the Hugging Face Hub

Day
Day 2 — Session Day 1
Time
1:30pm-1:50pm
Room
Leadership 2
Track
AI Architects: Show my Workflow

Accessible with the Leadership (All-Access) pass and above.

About this session

Hugging Face hosts over 2 million public models, 500,000+ datasets, and serves 13 million users across 50,000+ organizations, including over 30% of the Fortune 500. That growth didn't come with a manual.In this talk, we'll pull back the curtain on the infrastructure decisions that kept the Hub fast and reliable as traffic grew by orders of magnitude. We'll dive into why we chose MongoDB Atlas as our core data layer, how its document model maps naturally to the messy reality of ML model metadata, and what it took to keep p99 latency low when every request hits a catalog of millions. We'll also cover the trade-offs we faced, the things that broke along the way, and what "lean operations" actually means when your platform serves a third of the Fortune 500. Expect real architecture decisions, real numbers, and lessons you can take back to your own stack.

Topics

LLM Production Infra

Speaker