Date: Wednesday, May 24
Start Time: 12:00 pm
End Time: 12:30 pm
One of the main challenges when deploying computer vision models to the edge is optimizing the model for speed, memory and energy consumption. In this talk, we’ll provide a comprehensive survey of model compression approaches, which are crucial for harnessing the full potential of deep learning models on edge devices. We’ll explore pruning, weight clustering and knowledge distillation, explaining how these techniques work and how to use them effectively. We’ll also examine inference frameworks, including ONNX, TFLite and OpenVINO. We’ll discuss how these frameworks support model compression and explore the impact of hardware considerations on the choice of framework. We’ll conclude with a comparison of the techniques presented, considering implementation complexity and typical efficiency gains.