From Infrastructure to LLM Serving: Towards Full-Stack AI Systems
Department of Electrical and Computer Engineering
Location: Burchard Hall, Room 102
Speaker: Hanfei Yu, Ph.D. Candidate, Stevens Institute of Technology
ABSTRACT
Artificial Intelligence (AI) has rapidly transformed both industry and academia. However, the underlying systems that power modern AI workloads must continuously evolve to sustain performance, efficiency, and scalability. Optimizing AI systems is inherently a full-stack challenge, spanning cloud infrastructure, runtime systems, and model-level innovations.
In this talk, I will present three representative systems that improve efficiency across different layers of AI system stack: RainbowCake, InstaInfer, and FineMoE. RainbowCake and InstaInfer reduce inference latency in fully-managed AI platforms by optimizing container startup and provisioning mechanisms. FineMoE improves Mixture-of-Experts (MoE) serving efficiency through fine-grained expert offloading, reducing GPU memory footprint while preserving low inference latency.
Together, these works illustrate a unifying theme: algorithm–system co-design for building adaptive, resource-aware, and scalable full-stack AI systems. I will also briefly discuss ongoing efforts in other LLM serving systems that further extend this vision.
BIOGRAPHY
Hanfei Yu is a fifth-year Ph.D. student in the Department of Electrical and Computer Engineering at Stevens Institute of Technology, advised by Prof. Hao Wang. His research focuses on building efficient, adaptive, and resource-aware full-stack AI systems on cloud infrastructures through algorithm–system co-design. His work spans serverless computing, LLM serving systems, and reinforcement learning systems.
Hanfei’s research has been recognized with the ACM SoCC 2024 Best Paper Award and the ACM/IEEE SC 2024 Best Student Paper Finalist. He was also selected as a 2025 MLCommons ML and Systems Rising Star.
At any time, photography or videography may be occurring on Stevens’ campus. Resulting footage may include the image or likeness of event attendees. Such footage is Stevens’ property and may be used for Stevens’ commercial and/or noncommercial purposes. By registering for and/or attending this event, you consent and waive any claim against Stevens related to such use in any media. See Stevens' Privacy Policy for more information.