State-Space Models vs. Transformers for Ultra-Low-Power Edge AI

Date: Wednesday, May 21

Start Time: 2:05 pm

End Time: 2:35 pm

At the embedded edge, choices of language model architectures have profound implications on the ability to meet demanding performance, latency and energy efficiency requirements. In this presentation, we contrast state-space models (SSMs) with transformers for use in this constrained regime. While transformers rely on a read-write key-value cache, SSMs can be constructed as read-only architectures, enabling the use of novel memory types and reducing power consumption. Furthermore, SSMs require significantly fewer multiply-accumulate units—drastically reducing compute energy and chip area. New techniques enable distillation-based migration from transformer models such as Llama to SSMs without major performance loss. In latency-sensitive applications, techniques such as precomputing input sequences allow SSMs to achieve sub-100 ms time-to-first-token, enabling real-time interactivity. We present a detailed side-by-side comparison of these architectures, outlining their trade-offs and opportunities at the extreme edge.

Track

Session Speakers

Tony Lewis
Chief Technology Officer, BrainChip

Dr. Tony Lewis, Chief Technology Officer at BrainChip, is an executive, scientist and entrepreneur specializing in brain-inspired AI and robotics. Previously, Tony was Global Head of the Artificial Intelligence and Emerging Compute Lab at HP Labs. At Qualcomm Technologies, Tony headed neuromorphic, deep learning and robotics efforts. He has held faculty or leadership positions at UCLA, the University of Illinois Urbana-Champaign and the University of Arizona. Tony holds a PhD in Electrical Engineering from the University of Southern California and a Bachelor of Science in Cybernetics/Applied Math from UCLA.

State-Space Models vs. Transformers for Ultra-Low-Power Edge AI

Track

Session Speakers

Tony Lewis

Sponsors and Exhibitors

Get in Touch

Share