AI is on the cusp of a revolution, driven by the convergence of several breakthroughs. One of the most significant of these advances is the development of large language models (LLMs) that can reason like humans, enabling them to make decisions and take actions based on complex, nuanced inputs. Another is the integration of natural language processing and computer vision through vision-language models (VLMs).
In this Keynote talk, Professor Trevor Darrell of UC Berkeley will share his perspective on the current state and trajectory of research advancing machine intelligence. Darrell will present highlights of his group’s groundbreaking work, including methods for training vision models when labeled data is unavailable and techniques that enable robots to determine appropriate actions in novel situations.
Particularly relevant to edge applications, much of Professor Darrell’s work aims to overcome obstacles—such as massive memory and compute requirements—that limit the practical applications of state-of-the-art models. For example, he will discuss approaches to making VLMs smaller and more efficient while retaining accuracy. He will also show how LLMs can be used as visual reasoning coordinators, overseeing the use of multiple task-specific models to enable superior performance.
Darrell will also demonstrate how multimodal AI, visual perception and prompt-tuned reasoning are enabling consumers to utilize visual intelligence at home while preserving privacy.