Date: Wednesday, May 22
Start Time: 2:40 pm
End Time: 3:10 pm
AI hardware accelerators are playing a growing role in enabling AI in embedded systems such as smart devices. In most cases NPUs need a dedicated, tightly coupled high-speed memory to run efficiently. This memory has a major impact on performance, power consumption and cost. In this presentation, we will dive deep into our state-of-the-art memory optimization method that significantly decreases the size of the required NPU memory. This method utilizes processing by stripes and processing by channels to obtain the best compromise between memory footprint reduction and additional processing cost. Through this method, the original neural network is split into several pieces that are scheduled on the NPU. We will share results that show this technique yields large memory footprint reductions with moderate increases in processing time. We’ll also present our proprietary ONNX-based tool that automatically finds the optimal network configuration and schedules the subnetworks for execution.