Vision-Language Model Training

Vision-Language Models for Computer Vision Applications: A Hands-On Introduction

Tuesday, May 20, 2025, 9 am - 12 pm

Presented by:

Are you an engineer, developer or engineering manager eager to harness the power of generative AI for cutting-edge computer vision applications? Join us for an intensive three-hour training session designed to introduce the latest techniques in vision-language models (VLMs) and their integration with traditional computer vision methods. With a focus on the practical application of these techniques for real-world use cases, this course is tailored for professionals looking to expand their skill set in AI-driven computer vision, particularly in systems designed for deployment at the edge.

What You’ll Learn

Introduction to VLMs and LLM+Computer Vision Techniques with Jeff Bier, Founder of the Edge AI and Vision Alliance: We’ll start with an overview of vision-language models and how they differ from conventional convolutional neural networks. We’ll then discuss the advantages and potential drawbacks of integrating LLMs and VLMs with computer vision and explore real-world applications that benefit from these advanced techniques.

Technical Deep Dive with Satya Mallick, CEO of OpenCV: Gain insights into the basics of VLMs, including embeddings, CLIP and how different modalities (text, vision) are encoded. Learn about the types of training data required and the loss functions used in these models. This segment will provide the necessary background to tackle the practical examples that follow.

First Hands-On Example: Zero-Shot Image Classification. Our first practical example will be image classification with CLIP for zero-shot learning. You’ll build an image classifier capable of recognizing a wide array of images without prior training. Discover how CLIP’s zero-shot classification can be deployed on mobile devices and learn how to fine-tune the model for enhanced performance on specific datasets.

Second Hands-On Example: VLM with Agnostic Object Detector. We’ll develop a VLM-based visual AI system that identifies objects and reasons about them using pre-existing world knowledge. We’ll accomplish this by using a CNN-based class-agnostic object detector and integrating it with a VLM to answer complex questions about detected objects.

Who Should Attend

This training is ideal for engineers, developers, engineering managers and CTOs with a basic understanding of Python, Jupyter Notebook and computer vision concepts. Whether you’re working in mobile development, embedded systems or cloud applications, this course will provide you with the tools and knowledge to implement sophisticated AI solutions in your projects.

To make the most out of this training, you should have:

Working knowledge of Python
Basic familiarity with Jupyter Notebook or Google Colab
Basic familiarity with computer vision; familiarity with OpenCV and PyTorch is helpful but not required
Basic familiarity with GitHub

Why Attend?

The field of generative AI and multimodal LLMs is moving at a truly breakneck pace. This course offers a great way to keep up with this rapidly evolving technical landscape. In particular, it provides a unique blend of foundational knowledge and practical applications, ensuring you leave with actionable skills and access to sample code for continued learning.

Register Today

Registration is $495. Don’t miss this opportunity to enhance your skills and stay at the forefront of computer vision technology. Register today to secure your spot in this transformative training session.

Vision-Language Models for Computer Vision Applications: A Hands-On Introduction

Tuesday, May 20, 2025, 9 am - 12 pm

See you May 20-22, 2025, at the Santa Clara Convention Center!

Sponsors & Exhibitors

Get in Touch