The transformer architecture revolutionized the field of AI and serves as the basis for current state-of-the-art large language models. Yet, transformer networks struggle with the quadratic complexity of long sequences, a critical bottleneck for high-resolution embedded vision. In this talk, we will present structured state-space models (SSMs), specifically Mamba, as a linear-complexity alternative. We will share recent findings from our research and others, including “hidden attention”—a technique to interpret Mamba’s decision-making process—and novel methods like LongMamba and DeciMamba that significantly extend the context length of Mamba in an efficient way. Finally, we will discuss the application of these models to language and vision tasks, offering practical strategies to improve their reliability in edge deployments. Attendees will leave with a clear road map for adopting efficient, interpretable and robust SSMs for next-generation embedded AI.

