Embedding real-time large-scale deep learning vision applications at the edge is challenging due to their huge computational, memory, and bandwidth requirements. System architects can mitigate these demands by modifying deep-neural networks to make them more energy efficient and less demanding of processing resources by applying various model compression approaches. In this talk, we will provide an introduction to four established techniques for model compression. We will discuss network pruning, quantization, knowledge distillation and low-rank factorization compression approaches.