Date: Wednesday, May 22
Start Time: 4:15 pm
End Time: 5:20 pm
Multimodal large language models represent a transformative breakthrough in artificial intelligence, blending the power of natural language processing with visual understanding. In this talk, we delve into the essence of these models. We begin by explaining how large language models (LLMs) work at a fundamental level. We then explore how LLMs have evolved to integrate visual understanding, explain how they bridge the language and vision domains and show how they are trained. Next, we examine the current landscape of multimodal LLMs, including open solutions like LLaVA and BLIP. Finally, we explore applications that will be enabled by deploying these large models at the edge, identify the key challenges that must be overcome to enable this and highlight what is needed to overcome these challenges.