DNN Quantization: Theory to Practice

Date: Wednesday, May 22

Start Time: 4:15 pm

End Time: 4:45 pm

Deep neural networks, widely used in computer vision tasks, require substantial computation and memory resources, making it challenging to run these models on resource-constrained devices. Quantization involves modifying DNNs to use smaller data types (e.g., switching from 32-bit floating-point values to 8-bit integer values). Quantization is an effective way to reduce the computation and memory bandwidth requirements of these models, and their memory footprints, making it easier to run them on edge devices. However, quantization does degrade the accuracy of CNNs. In this talk, we survey practical techniques for DNN quantization and share best practices, tools and recipes to enable you to get the best results from quantization, including ways to minimize accuracy loss.

Track

Session Speakers

Dwith Chenna
MTS Product Engineer, AI Inference, AMD

Dwith Chenna is a research and development professional with a strong focus on algorithm development and optimization in the fields of computer vision, deep learning and human computer interaction. He has extensive experience in developing state-of-the-art, performance-critical perception systems and a deep understanding of the complexities involved in developing and optimizing deep learning models on resource-constrained hardware, such as digital signal processors. Dwith’s responsibilities include evaluating embedded algorithms for performance and accuracy and driving key performance metrics such as latency, memory, bandwidth and power consumption—often through integration and development of tooling and automation. He is also responsible for quantizing, optimizing and tuning the performance of deep learning models.

DNN Quantization: Theory to Practice

Track

Session Speakers

Dwith Chenna

Sponsors & Exhibitors

Get in Touch

Share