As edge computing demands smaller, more efficient models, knowledge distillation emerges as a key approach to model compression. We delve into the details of this process, exploring what knowledge distillation entails and the requirements for its implementation, including dataset size and tools. We examine when to use knowledge distillation, its pros and cons, and showcase examples of successfully distilled models. Based on performance data highlighting the benefits of distillation, we conclude that knowledge distillation is a powerful tool for creating smaller, smarter models that thrive at the edge.