Self-compression is a quantization-aware training technique to reduce neural network size and optimize performance for edge inference. By learning optimal bit depths for weights and activations during training, self-compression achieves significant reductions in memory footprint and bandwidth consumption while maintaining […]
