Start Time: 9:30 am
End Time: 10:00 am
Quantization is a key technique to enable the efficient deployment of deep neural networks. In this talk, we present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. We explore simple and advanced quantization approaches and examine their effects on latency and accuracy on various target processors. We also present best practices for quantization-aware training to obtain high accuracy with quantized weights and activations.