Multimodal AI Agents for Content Editing

Date: Tuesday, May 12

Start Time: 5:25 pm

End Time: 5:55 pm

In this talk, we will explain how agentic systems can learn from experience to understand and edit images within a unified workflow. We will highlight core capabilities—including high-level intent understanding, grounded region-level editing and multistep planning—that enable users to perform complex edits through natural language rather than manual toolchains. We will also discuss practical challenges, such as learning efficiently from long trajectories while preserving high image quality and strong prompt adherence.

Track

Session Speakers

Yong Jae Lee
Professor, Department of Computer Sciences, University of Wisconsin-Madison and Research Scientist, Adobe Research

Yong Jae Lee is a Professor of Computer Science at the University of Wisconsin-Madison and a Research Scientist at Adobe Research. His research interests are in computer vision and machine learning, with a focus on robust AI systems that learn to understand the multimodal world with minimal human supervision. Before joining UW-Madison, Professor Lee spent one year as Visiting Faculty at Cruise and six years as an Assistant and then Associate Professor at UC Davis. He received his PhD from the University of Texas at Austin and was a postdoc at Carnegie Mellon University and UC Berkeley. Professor Lee is an author of the widely cited paper “Visual Instruction Tuning,” which proposes LLaVA (large language and vision assistant), a large multimodal model for general-purpose visual and language understanding. He has received numerous prestigious awards, including the NSF CAREER Award and the UW-Madison SACM Student Choice Professor of the Year Award.

Track

Session Speakers

Yong Jae Lee

See you May 11-13, 2026 in Silicon Valley, California

Sponsors and Exhibitors

Get in Touch

Share