Date: Tuesday, May 17 (Main Conference Day 1)
Start Time: 12:00 pm
End Time: 12:30 pm
To reduce the memory requirements of neural networks, researchers have proposed numerous heuristics for compressing weights. Lower precision, sparsity, weight sharing and various other schemes shrink the memory needed by the neural network’s weights or program. Unfortunately, during network execution, memory use is usually dominated by activations–the data flowing through the network–rather than weights. Although lower precision can reduce activation memory somewhat, more extreme steps are required in order to enable large networks to run efficiently with small memory footprints. Fortunately, the underlying information content of activations is often modest, so novel compression strategies can dramatically widen the range of networks executable on constrained hardware. In this talk, we introduce some new strategies for compressing activations, sharply reducing their memory footprint.