Date: Wednesday, May 24
Start Time: 11:25 am
End Time: 12:30 pm
The neural network models used in embedded real-time applications are evolving quickly. Transformer networks are a deep learning approach that has become dominant for natural language processing and other time-dependent, series data applications. Now, transformer-based deep learning network architectures are also being applied to vision applications with state-of-the-art results compared to CNN-based solutions. In this presentation, we will introduce transformers and contrast them with the CNNs commonly used for vision tasks today. We will examine the key features of transformer model architectures and show performance comparisons between transformers and CNNs. We will conclude with insights on why we think transformers will become increasingly important for visual perception tasks.