In this session we’ll explain two neural network quantization techniques, quantization-aware training (QAT) and post-training quantization (PTQ), and explain when to use each. We’ll discuss what needs to be done for efficient implementation of each: for example, QAT requires preparation of models through layer fusion and graph optimization, while PTQ requires a suitable dataset. We will highlight the advantages and limitations of each approach and explore model architectures that benefit from QAT and PTQ. We will also present strategies for combining these techniques and introduce tools such as Brevitas that enable quantization, demonstrating how to optimize neural networks for improved performance and efficiency.