Foundation Models for Robotic Manipulation: Opportunities and Challenges

Robot hand and human hand around digital globe.

Department of Electrical and Computer Engineering

Location: Burchard Hall, Room 102

Speaker: Yunzhu Li, Assistant Professor, Columbia University

ABSTRACT

Foundation models have shown impressive capabilities in language and vision, pointing toward a new generation of robotic systems that can reason at a high level and adapt to diverse instructions. In this talk, I will first discuss how such models can be integrated into robotic pipelines for task specification and task-level planning, by translating commonsense knowledge from foundation models into structured priors for robot learning and control. Through modular combinations, such as VLMs for task interpretation and optimization-based planners for execution, robots can begin to follow free-form natural language instructions and perform increasingly diverse manipulation tasks. In the second half, I will argue that truly general manipulation requires grounding high-level reasoning in physical reality. I will present a vision in which scalable multi-modal sensing (that tightly couples vision and touch), together with structured world models, provides the missing link between abstract reasoning and physical interaction. By explicitly modeling geometry, dynamics, and contact through a combination of physics and learning, these representations enable physically grounded decision making while supporting scalable training, simulation, and evaluation of robot policies, moving robotic manipulation toward greater robustness and generality in the real world.

BIOGRAPHY

Yunzhu Li.

is an Assistant Professor of Computer Science at Columbia University. Before joining Columbia, he was an Assistant Professor at UIUC CS and spent time as a Postdoc at Stanford, collaborating with Fei-Fei Li and Jiajun Wu. Yunzhu earned his PhD from MIT under the guidance of Antonio Torralba and Russ Tedrake. His work has been recognized with the Best Paper Award at ICRA, the Best Systems Paper Award, and as a Finalist for the Best Paper Award at CoRL. He is also a recipient of the AAAI New Faculty Highlights, the Sony Faculty Innovation Award, the Amazon Research Award, the Adobe Research Fellowship, and the First Place Ernst A. Guillemin Master’s Thesis Award in AI and Decision Making at MIT. His research has been published in top journals and conferences, including Nature and Science, and featured by major media outlets.

At any time, photography or videography may be occurring on Stevens’ campus. Resulting footage may include the image or likeness of event attendees. Such footage is Stevens’ property and may be used for Stevens’ commercial and/or noncommercial purposes. By registering for and/or attending this event, you consent and waive any claim against Stevens related to such use in any media. See Stevens' Privacy Policy for more information.