Date: Thursday, May 23
Start Time: 10:20 am
End Time: 11:10 am
Large language models (LLMs) are fueling a revolution in AI. And, while chatbots are the most visible manifestation of LLMs, the use of multimodal LLMs for visual perception—for example, vision language models like LLaVA that are capable of understanding both text and images—may ultimately have greater impact given that so many AI use cases require an understanding of both language concepts and visual data, versus language alone.
To what extent—and how quickly—will multimodal LLMs change how we do computer vision and other types of machine perception? Are they needed for real-world applications, or are they a solution looking for a problem?
If they are needed, are they needed at the edge? What will be the main challenges in running them there? Is it the nature of the computation, the amount of computation, memory bandwidth, ease of development or some other factor? Is today’s edge hardware up to the task? If not, what will it take to get there?
To answer these and many other questions around the rapidly evolving role of multimodal LLMs in machine perception applications at the edge, we’ve assembled an amazing set of panelists who have firsthand experience with these models and the challenges associated with implementing them at the edge. Join us for a lively and insightful discussion!