Qianru Lao

Member of Technical Staff, Inference · OpenAI

LinkedIn Website

Bio

Qianru Lao is a Member of Technical Staff on the Inference team at OpenAI, where she works on infrastructure for large-scale model serving. Previously, she contributed to the open-source Delta Lake project at Databricks and worked on distributed storage systems at Alibaba Cloud and infrastructure tooling at Google. She holds degrees in Computational Science and Engineering from Harvard and Computer Science from Sun Yat-sen University.

Session (1)

Day 411:10am-11:30amTrack 9

Routing LLM Inference in Production: From Engine Signals to Policy

Qianru Lao works on Inference at OpenAI, with a research background in LLM inference scheduling and systems optimization. Her paper "Fast Inference for Augmented Large Language Models" (the MARS/LAMPS scheduler, accepted as a NeurIPS 2025 poster) demonstrated 27-85% end-to-end latency improvements for augmented LLM workloads that make external API calls. Worth attending for anyone building or scaling LLM inference infrastructure who wants memory-aware scheduling techniques from someone working on inference at OpenAI.

GitHub

@EstherBear

Recent writing (1)

Fast Inference for Augmented Large Language Models (MARS / LAMPS), NeurIPS 2025 poster · paper · Dec 2025

Public activity researched automatically · as of Jun 2026