What You’ll Learn
Introduction to VLMs and LLM+Computer Vision Techniques with Jeff Bier, Founder of the Edge AI and Vision Alliance: We’ll start with an overview of vision-language models and how they differ from conventional convolutional neural networks. We’ll then discuss the advantages and potential drawbacks of integrating LLMs and VLMs with computer vision and explore real-world applications that benefit from these advanced techniques.
Technical Deep Dive with Satya Mallick, CEO of OpenCV: Gain insights into the basics of VLMs, including embeddings, CLIP and how different modalities (text, vision) are encoded. Learn about the types of training data required and the loss functions used in these models. This segment will provide the necessary background to tackle the practical examples that follow.
First Hands-On Example: Zero-Shot Image Classification. Our first practical example will be image classification with CLIP for zero-shot learning. You’ll build an image classifier capable of recognizing a wide array of images without prior training. Discover how CLIP’s zero-shot classification can be deployed on mobile devices and learn how to fine-tune the model for enhanced performance on specific datasets.
Second Hands-On Example: VLM with Agnostic Object Detector. We’ll develop a VLM-based visual AI system that identifies objects and reasons about them using pre-existing world knowledge. We’ll accomplish this by using a CNN-based class-agnostic object detector and integrating it with a VLM to answer complex questions about detected objects.
Who Should Attend
This training is ideal for engineers, developers, engineering managers and CTOs with a basic understanding of Python, Jupyter Notebook and computer vision concepts. Whether you’re working in mobile development, embedded systems or cloud applications, this course will provide you with the tools and knowledge to implement sophisticated AI solutions in your projects.
To make the most out of this training, you should have:
- Working knowledge of Python
- Basic familiarity with Jupyter Notebook or Google Colab
- Basic familiarity with computer vision; familiarity with OpenCV and PyTorch is helpful but not required
- Basic familiarity with GitHub
Why Attend?
The field of generative AI and multimodal LLMs is moving at a truly breakneck pace. This course offers a great way to keep up with this rapidly evolving technical landscape. In particular, it provides a unique blend of foundational knowledge and practical applications, ensuring you leave with actionable skills and access to sample code for continued learning.
Register Today
Registration is $495. Don’t miss this opportunity to enhance your skills and stay at the forefront of computer vision technology. Register today to secure your spot in this transformative training session.
