Date: Wednesday, May 22
Start Time: 2:05 pm
End Time: 2:35 pm
Transformers are a class of neural network models originally designed for natural language processing. Transformers are also powerful for visual perception due to their ability to model long-range dependencies and process multimodal data. Resource constraints form a central challenge when deploying transformers on embedded platforms. Transformers demand substantial memory for parameters and intermediate computations. Further, the computations involved in self-attention create challenging computation requirements. Energy efficiency adds another layer of complexity. Mitigating these challenges requires a multifaceted approach. Optimization techniques like quantization ameliorate memory constraints. Pruning and sparsity techniques, removing less critical connections, alleviate computation demands. Knowledge distillation transfers knowledge from larger models to compact models. Shang-Hung will also discuss hardware accelerators such as NPUs customized for transformer workloads, and software techniques for efficiently mapping transformer models to hardware accelerators.