In this talk, we will explain how agentic systems can learn from experience to understand and edit images within a unified workflow. We will highlight core capabilities—including high-level intent understanding, grounded region-level editing and multistep planning—that enable users to perform complex edits through natural language rather than manual toolchains. We will also discuss practical challenges, such as learning efficiently from long trajectories while preserving high image quality and strong prompt adherence.

