Date: Wednesday, May 22
Start Time: 2:40 pm
End Time: 3:10 pm
AI hardware accelerators, also called NPUs, are playing an increasing role in enabling AI in embedded systems such as smart devices. In most cases NPUs need a dedicated, tightly coupled high-speed memory to run efficiently. This memory has a major impact on system performance, power consumption and cost. In this presentation, we will deep dive into our state-of-the-art memory optimization method that allows us to significantly decrease the size of NPU memory. This method leverages optimizations utilizing processing by stripes and processing by channels to obtain the best compromise between memory reduction and additional processing cost. Through this method, the original neural network is split into several pieces that are scheduled on the NPU. We’ll also present our proprietary tool (based on the ONNX format) which automatically finds the optimal network configuration and prepares the network scheduling on the NPU.