Embedded vision applications, with their demand for ever more processing power, are driving up the size and complexity of edge SoCs. Heterogeneous architectures that bring together CPUs, GPUs and specialized NPUs have become the standard approach to achieving high-performance density and compelling headline specifications for edge vision; yet this approach is not without its problems, especially given the rapid evolution of AI models and the anticipated thermal challenges associated with more advanced process nodes. While software is set to be the true enabler of success, with community-wide initiatives such as the UXL Foundation empowering application developers to port code seamlessly to edge devices, it is clear that flexible, parallel and, most importantly, programmable hardware is central to delivering high-performance, high-efficiency vision applications at the edge. But the right solution isn’t here yet. What will it look like?