Date: Tuesday, May 17 (Main Conference Day 1)
Start Time: 2:05 pm
End Time: 2:35 pm
Convolutional Neural Networks are ubiquitous in academia and industry, especially for computer vision and language processing tasks. However, their superior ability to learn meaningful representations in large-scale data comes at a price—they are often over-parameterized, with millions of parameters yielding additional latency and unnecessary costs when deployed in production. In this talk, we will present the foundations of knowledge distillation, an essential tool for improving the performance of neural networks by compressing their size. Knowledge distillation entails training a lightweight model, referred to as the student, to replicate a pre-trained larger model, called the teacher. We will illustrate how this process works in detail by presenting a real-world image restoration task we recently worked on at Bending Spoons. By squeezing the knowledge of the teacher model, we obtained a threefold speedup and improved the quality of the reconstructed images.