I am the Inference Lead for SGLang at LMSYS, working closely with Lianmin Zheng and Ying Sheng to co-lead the project. I am responsible for SGLang’s roadmap, major releases, and performance optimization, and have led key launches such as Llama 3, DeepSeek V3, Large Scale EP, and GB200 NVL72. I am also a committer to FlashInfer and LMDeploy, and co-authored the FlashInfer paper ( MLSys 2025 Best Paper). Previously, I was a Lead Software Engineer at Baseten, where I co-authored the DeepSeek V3 and Qwen 3 launch blogs and The Baseten Inference Stack ebook. Earlier, I worked at Meituan on CTR GPU inference and vector retrieval systems, and co-authored the QQQ paper (ICLR 2025 Workshop).
I’m actively building the open-source SGLang community—if you're passionate about LLM systems, come join us on the SGLang Slack!
“Most of the team graduated from the top universities in China,” said Yineng Zhang, a lead software engineer at Baseten in San Francisco who works on the SGLang, a project not part of DeepSeek that helps people build on top of DeepSeek’s system. “They are very smart and very young.”
While employees at big Chinese technology companies are limited to collaborating with colleagues, “if you work on open source, you work with talent around the world,” said Yineng Zhang, lead software engineer at Baseten in San Francisco who works on the open source SGLang project. He helps other people and companies build products using DeepSeek’s system.
Baseten
Lead Software Engineer
Model Performance Team
September 2024 - June 2025
LMSYS Org
Team Member, Inference Lead for SGLang
July 2024 - now
Meituan
Senior Software Engineer
Machine Learning Engine Group
August 2021 - July 2024
Baidu
Software Engineer
Baidu Speech
June 2020 - August 2021
Stealth Startup
Software Engineer
July 2019 - June 2020
Jiangnan University
Bachelor of Engineering
September 2015 - June 2019