Embedding real-time large-scale deep learning vision applications at the edge is challenging due to their huge computational, memory, and bandwidth requirements. System architects can mitigate these demands by modifying deep-neural networks to make them more energy efficient and less demanding of processing resources by applying various model compression approaches. In this talk, we will provide an introduction to four established techniques for model compression. We will discuss network pruning, quantization, knowledge distillation and low-rank factorization compression approaches.

Track

Session Speakers

Sabina Pokhrel
Customer Success AI Engineer, Xailient

Sabina Pokhrel is an AI engineer with a vision to make a positive impact in the world by utilizing technology. She has Masters of Engineering – Computer Systems Engineering from Lancaster University and a Masters Degree in Computing from The Australian National University (specialized in AI). Currently, she leads Xailient’s customer success program, onboarding customers, creating video demos of Xailient products and helping customers achieve their goals with Xailient tools and software. She is also a leader in marketing, creating content, writing blogs and publications and developing marketing strategies at Xailient. She is a former editorial associate at Towards Data Science publications.

Introduction to DNN Model Compression Techniques

Track

Session Speakers

Sabina Pokhrel

Share