Practical DNN Quantization Techniques and Tools

Date: Tuesday, September 22, 2020

Start Time: 9:30 am

End Time: 10:00 am

Quantization is a key technique to enable the efficient deployment of deep neural networks. In this talk, we present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. We explore simple and advanced quantization approaches and examine their effects on latency and accuracy on various target processors. We also present best practices for quantization-aware training to obtain high accuracy with quantized weights and activations.

Track

Session Speakers

Raghuraman Krishnamoorthi
Software Engineer, Facebook

Raghuraman Krishnamoorthi is a software engineer in the Pytorch team at Facebook, where he leads the effort to optimize deep networks for inference, with a focus on quantization. Prior to this he was part of the Tensorflow team at Google working on quantization for mobile inference as part of Tensorflow Lite. From 2001 to 2017, Raghu was at Qualcomm Research, working on several generations of wireless technologies. His work experience also includes computer vision for AR, ultra-low power, always-on vision, hardware/software co-design for inference on mobile platforms and modem development. He is an inventor in more than 90 issued and filed patents. Raghu has a Masters degree in EE from University of Illinois, Urbana Champaign and a Bachelors degree from the Indian Institute of Technology, Madras.

Practical DNN Quantization Techniques and Tools

Track

Session Speakers

Raghuraman Krishnamoorthi

Share