Shrinking Vision-Language Models for Edge Deployment via Distillation and Pruning

The next generation of AI agents is moving beyond cloud-based text-only models and will interact with the physical multimodal world in real time. In the vision domain, AI agents rely on vision-language models (VLMs) as their backbones. However, deploying massive VLMs with billions of parameters on embedded devices remains a significant engineering hurdle. Drawing on our recent ICML and CVPR research papers, we will explore advancements in VLM optimization, specifically how distillation and pruning transform “heavyweight” models into lean, edge-ready engines. In particular, we will examine recent feature alignment and mixture-of-experts methods for distillation and training-free token pruning approaches, providing practical insights for building computationally efficient VLMs for embedded systems.

Track

Technical Insights

Session Speakers

Denis Gudovskiy
Distinguished AI Research Engineer, Panasonic AI Lab

Denis Gudovskiy is a Distinguished AI Research Engineer at Panasonic AI lab in Mountain View. He specializes in machine learning algorithms for AI applications. His portfolio of research projects includes optimization of deep neural networks for edge AI devices, explainable AI tools and automatic dataset management for computer vision applications. Denis received his MS in Computer Engineering from the University of Texas at Austin in 2008.

Shrinking Vision-Language Models for Edge Deployment via Distillation and Pruning

Track

Session Speakers

Denis Gudovskiy

See you May 11-13, 2026 in Silicon Valley, California

Sponsors and Exhibitors

Get in Touch

Share