Date: Wednesday, May 18 (Main Conference Day 2)
Start Time: 11:25 am
End Time: 11:55 am
As the applications of autonomous systems expand, many such systems need the ability to perceive using both vision and language, coherently. For example, some systems need to translate a visual scene into language. Others may need to follow language-based instructions when operating in environments that they understand visually. Or, they may need to combine visual and language inputs to understand their environments. In this talk, we will introduce popular approaches to joint language-vision perception. We will also present a unique deep learning rule-based approach utilizing a universal language object model. This new model derives rules and learns a universal language of object interaction and reasoning structure from a corpus, which it then applies to the objects detected visually. We will show that this approach works reliably for frequently occurring actions. We’ll also show that this type of model can be localized for specific environments and can communicate with humans and other autonomous systems.