Date: Wednesday, May 18 (Main Conference Day 2)
Start Time: 10:15 am
End Time: 10:45 am
The neural network architectures used in embedded real-time applications are evolving quickly. Transformers are a leading deep learning approach for natural language processing and other time-dependent, series data applications. Now, transformer-based deep learning network architectures are also being applied to vision applications with state-of-the-art results compared to CNN-based solutions. In this presentation, we will introduce transformers and contrast them with the CNNs commonly used for vision tasks today. We will examine the key features of transformer model architectures and show performance comparisons between transformers and CNNs. We will conclude the presentation with insights on why we think transformers are an important approach for future visual perception tasks.