Qianru Lao is a Member of Technical Staff on the Inference team at OpenAI, where she works on infrastructure for large-scale model serving. Previously, she contributed to the open-source Delta Lake project at Databricks and worked on distributed storage systems at Alibaba Cloud and infrastructure tooling at Google. She holds degrees in Computational Science and Engineering from Harvard and Computer Science from Sun Yat-sen University.
Qianru Lao works on Inference at OpenAI, with a research background in LLM inference scheduling and systems optimization. Her paper "Fast Inference for Augmented Large Language Models" (the MARS/LAMPS scheduler, accepted as a NeurIPS 2025 poster) demonstrated 27-85% end-to-end latency improvements for augmented LLM workloads that make external API calls. Worth attending for anyone building or scaling LLM inference infrastructure who wants memory-aware scheduling techniques from someone working on inference at OpenAI.
Public activity researched automatically · as of Jun 2026