Date: Thursday, September 24, 2020
Start Time: 1:00 pm
End Time: 1:30 pm
With the rapid increase in the sizes of deep neural networks (DNNs), there has been extensive research on network model compression to improve deployment efficiency. In this presentation, we present our work to advance compression beyond the weights to neuron activations. We propose a joint regularization technique that simultaneously regulates the distribution of weights and activations. By distinguishing and leveraging the significant difference among neuron responses and connections during learning, the jointly pruned networks (JPnet) optimize the sparsity of activations and weights. The derived deep sparsification reveals more optimization space for existing DNN accelerators utilizing sparse matrix operations. We evaluate the effectiveness of joint regularization through various network models with different activation functions and on different datasets. With a 0.4% degradation constraint on inference accuracy, a JPnet can save 72% to 99% of computation cost, with up to 5.2x and 12.3x reductions in activation and weight numbers, respectively.