Date: Wednesday, May 22
Start Time: 5:25 pm
End Time: 5:55 pm
Large language models (LLMs) are revolutionizing the way we interact with computers and the world around us. However, in order to truly understand the world, LLM-powered agents need to be able to see. Will models in production be multimodal, or will text-only LLMs leverage purpose-built vision models as tools? Where do techniques like multimodal retrieval-augmented generation (RAG) fit in? In this talk, Jacob Marks will give an overview of key LLM-centered projects that are reshaping the field of computer vision and discuss where we are headed in a multimodal world.