From Benchmarks to the Real World: Making Vision-Language Models Work on Edge Systems

Vision-language models (VLMs) are moving into practical computer vision applications, but we’ve found that model accuracy benchmark scores rarely translate to production success—especially in real-world and edge deployments. In this talk, we’ll walk through the failure modes we see most often: sensor noise, lighting variation and environmental drift; domain shift between training and deployment; hallucinations and weak visual grounding; and rare, out-of-distribution events that break seemingly robust systems. We’ll then cover the system-level choices that determine outcomes, including camera selection, data collection, curation and annotation practices, multimodal fusion with auxiliary sensors and when to combine classical vision and signal processing with learned models. Next, we’ll share deployment techniques we use in practice: quantization, pruning, distillation and size/speed/power trade-offs. Finally, we’ll show how we evaluate deployed VLMs with task metrics, stress tests and production monitoring to detect drift, concept shift and sensor degradation, and how we close the loop with targeted retraining and model updates.

Track

Technical Insights

Session Speakers

Shakil Khan
Senior Software Engineer, McAfee

Shakil Khan is a Senior Software Engineer and Technical Lead with over a decade of experience building production-grade systems across computer vision, machine learning and AI infrastructure. His work focuses on deploying reliable multimodal and vision-language models in real-world environments with an emphasis on system design, model optimization, edge deployment and MLOps. He has led end-to-end development of AI-powered products spanning data pipelines, model training, inference optimization and production operations. Shakil regularly works at the intersection of classical systems engineering and modern deep learning and is particularly interested in bridging the gap between research benchmarks and deployable AI systems.

From Benchmarks to the Real World: Making Vision-Language Models Work on Edge Systems

Track

Session Speakers

Shakil Khan

See you May 11-13, 2026 in Silicon Valley, California

Sponsors and Exhibitors

Get in Touch

Share