In this talk we’ll present Moonshine, a speech-to-text model that outperforms OpenAI’s Whisper by a factor of five in terms of speed. Leveraging this efficiency, we’ll show how to build a voice interface on a low-cost, resource-constrained Cortex-A SoC using open-source tools. We’ll also cover how to use voice activity detection as a first step before running speech-to-text to avoid false positives on noise that isn’t speech. In addition, we’ll demonstrate how to use Python to control speech recognition and take actions based on recognized words. The Moonshine model’s compact size (as small as 26 MB) and high accuracy (<5% word error rate) make it ideal for embedded applications. All code and documentation will be made available online, allowing attendees to replicate the project. This presentation will showcase the potential for voice-enabled interfaces on affordable hardware, enabling a wide range of innovative applications.