As small language and multimodal models improve, a practical question is emerging for product teams: have we hit a “good enough” quality threshold for embeddable AI, and what are the lowest-cost SoCs that can actually run these models end-to-end? In this talk we present a comparative study of low-cost chips evaluated against consistent requirements, spanning small LLM inference, speech-to-text and text-to-speech and vision-language workloads. We’ll share performance and explain why TOPS is often a poor predictor of real outcomes. We’ll also highlight usability challenges arising from unsupported operators and brittle conversion/deployment toolchains. We’ll explore where today’s NPUs are misaligned with transformer workloads and why quantized ~3B-class models often represent a practical ceiling. Attendees will leave with concrete selection criteria, a realistic view of current limitations and a road map of next steps—new silicon to watch, techniques for pushing toward bigger models and quality gains from fine-tuning.

