Recent Advances in Post-training Quantization

Date: Tuesday, September 22, 2020

Start Time: 10:00 am

End Time: 10:30 am

The use of low-precision arithmetic (8-bit and smaller data types) is key for the deployment of deep neural network inference with high performance, low cost and low power consumption. Shifting to low-precision arithmetic requires a model quantization step that can be performed at model training time (quantization-aware training) or after training (post-training quantization). Post-training quantization is an easy way to quantize already trained models that provides good accuracy/performance trade-off. In this talk, we review recent advances in post-training quantization methods and algorithms that help to reduce quantization error. We also show the performance speed-up that can be achieved for various models when using 8-bit quantization.

Track

Session Speakers

Alexander Kozlov
Deep Learning R&D Engineer, Intel

Alexander is a Deep Learning Engineer at Intel. Before Intel he was a researcher at Itseez (acquired by Intel) where he worked on computer vision algorithms for driver assistance systems. Now Alexander focuses on DNN compression methods and tools which enable lightweight and hardware-friendly models

Recent Advances in Post-training Quantization

Track

Session Speakers

Alexander Kozlov

Share