Asaf Gardin

Inference Engineer · AI21

Bio

Asaf Gardin is a Senior Software Engineer on the inference team at AI21 Labs, where he works on high-performance LLM inference and the production deployment of the Jamba hybrid SSM-Transformer models. He's an active vLLM committer, contributing to quantization, scheduling, and support for Mamba-based architectures. His talk covers two production bugs in vLLM's Mamba support - a scheduler edge case that corrupted SSM state under memory pressure, and a 32-bit integer overflow in a CUDA kernel that surfaced as RL training instability - both root-caused at AI21 and fixed upstream. He also built Kernel Academy, a browser-based tutorial for learning Triton GPU programming. Previously at IBM.

Session (1)

Day 42:50pm-3:10pmTrack 9

Two Bugs That Hid in Plain Sight: A vLLM Debugging Detective Story

Asaf Gardin is a Senior Software Engineer on the Inference Team at AI21 Labs, with hands-on expertise optimizing vLLM for production deployments of AI21's Jamba hybrid (Mamba+Transformer) models. Attendees interested in LLM inference engineering, vLLM internals, and hard-won debugging lessons from real large-scale deployments will find his session directly applicable.

GitHub

@Josephasafg

Recent writing (1)

One token to corrupt them all: a vLLM debugging tale · blog · Jan 2026

Public activity researched automatically · as of Jun 2026