Voice Interfaces on a Budget: Building Real-Time Speech Recognition on Low-Cost Hardware

In this talk we’ll present Moonshine, a speech-to-text model that outperforms OpenAI’s Whisper by a factor of five in terms of speed. Leveraging this efficiency, we’ll show how to build a voice interface on a low-cost, resource-constrained Cortex-A SoC using open-source tools. We’ll also cover how to use voice activity detection as a first step before running speech-to-text to avoid false positives on noise that isn’t speech. In addition, we’ll demonstrate how to use Python to control speech recognition and take actions based on recognized words. The Moonshine model’s compact size (as small as 26 MB) and high accuracy (<5% word error rate) make it ideal for embedded applications. All code and documentation will be made available online, allowing attendees to replicate the project. This presentation will showcase the potential for voice-enabled interfaces on affordable hardware, enabling a wide range of innovative applications.

Track

Enabling Technologies

Session Speakers

Pete Warden
CEO, Useful Sensors

Pete Warden is a thinker, innovator and entrepreneur in AI, software and big data. In 2003, Pete created a set of image processing filters to detect features in video content, which was purchased by Apple. Later, he co-founded Jetpac, which created a product that analyzed millions of photos and generated in-depth guides for more than 5,000 cities. Pete joined Google in 2014, when Google acquired Jetpac. At Google, he led the development of the TensorFlow Lite framework, including an experimental version of TensorFlow Lite for Microcontrollers. In 2022, Pete co-founded Useful Sensors, where he is the CEO. Useful Sensors provides advanced AI solutions for edge devices, powering real-time translation with Torre and precise speech-to-text with Moonshine. Their technology enables fast, efficient communication directly on device, eliminating the need for cloud connectivity. These seamlessly integrated solutions drive smarter, more connected user experiences. Pete is the author of three O’Reilly books. He earned his BS in Computer Science from the University of Manchester and is currently enrolled in a PhD program at Stanford University.

Voice Interfaces on a Budget: Building Real-Time Speech Recognition on Low-Cost Hardware

Track

Session Speakers

Pete Warden

See you May 20-22, 2025, at the Santa Clara Convention Center!

Sponsors & Exhibitors

Get in Touch

Share