Date: Wednesday, May 24
Start Time: 1:30 pm
End Time: 2:35 pm
Convolutional neural networks, widely used in computer vision tasks, require substantial computation and memory resources, making it challenging to run these models on resource-constrained devices. Quantization involves modifying CNNs to use smaller data types (e.g., switching from 32-bit floating-point values to 8-bit integer values). Quantization is an effective way to reduce the computation and memory bandwidth requirements of these models, and their memory footprints, making it easier to run them on edge devices. However, quantization does degrade the accuracy of CNNs. In this talk, we survey practical techniques for CNN quantization and share best practices, tools and recipes to enable you to get the best results from quantization, including ways to minimize accuracy loss.