Small Language Models for Edge AI: Trade-Offs and Quantization in Practice

Large language models are powerful but often impractical for embedded and on-prem systems due to latency, cost, privacy and memory constraints. Small language models (SLMs)—typically comprised of single-digit billions of parameters or less—offer a deployable alternative but require different expectations and engineering choices. In this talk, Dwith Chenna will introduce the SLM landscape and their applications together with performance and accuracy comparisons against LLMs. Dwith will then examine the quantization techniques that matter for SLM deployment—gradient post-training quantization, SmoothQuant and activation-aware weight quantization—explaining how they work and how to compare them using metrics such as perplexity, task accuracy (e.g., MMLU/ARC/HellaSwag) and runtime performance (e.g., tokens/sec, latency). Attendees will leave with a practical checklist for selecting, quantizing and evaluating SLMs for real edge systems.

Track

Technical Insights

Session Speakers

Dwith Chenna
MTS Product Engineer, AI Inference, AMD

Dwith Chenna is an experienced research and development professional specializing in algorithm development and optimization in computer vision, deep learning and edge AI. With a strong background in creating advanced, performance-critical perception systems, Dwith excels at optimizing deep learning models for resource-limited hardware accelerators. In his current position at AMD, he plays a crucial role in promoting the adoption of AMD’s machine learning inference solutions, ensuring a smooth customer onboarding experience and delivering high customer satisfaction. He is key in enabling the productization of end-to-end AI inference solutions on AMD’s CPUs, NPUs and embedded devices by working closely with developers and collaborating with sales, marketing and R & D teams. Dwith’s responsibilities include evaluating embedded algorithms for performance and accuracy, driving key performance metrics and developing comprehensive onboarding materials such as use cases, tutorials and methodology documents.

Small Language Models for Edge AI: Trade-Offs and Quantization in Practice

Track

Session Speakers

Dwith Chenna

See you May 11-13, 2026 in Silicon Valley, California

Sponsors and Exhibitors

Get in Touch

Share