The deployment of large language models (LLMs) in resource-constrained environments is challenging due to the significant computational and memory demands of these models. To address this challenge, various quantization techniques have been proposed to reduce the model’s resource requirements while maintaining its accuracy. This talk provides a comprehensive review of post-training quantization (PTQ) methods, highlighting their trade-offs and applications in LLMs. We explain quantization techniques such as gradient post-training quantization (GPTQ), activation-aware weight quantization (AWQ) and SmoothQuant, and evaluate their performance on popular LLMs like the Open Pre-trained Transformer (OPT) language model series and Meta’s Llama-2 LLM. Our results demonstrate that these techniques can significantly reduce these models’ size and computational requirements while maintaining their accuracy, making them suitable for deployment in edge environments.