In many real-world use cases, deep learning algorithms work well if you have enough high-quality data to train them. Obtaining that data is a critical limiting factor in the development of effective artificial intelligence. In this talk, we’ll identify common pitfalls encountered in obtaining and using public and private data for training and evaluating deep neural networks for visual AI – and we’ll present techniques to overcome these pitfalls. We’ll also present the open source Computer Vision Annotation Tool (CVAT – https://github.com/opencv/cvat), illustrating techniques we have implemented to streamline annotation of visual data at scale. We’ll discuss challenges we faced in developing CVAT, how we addressed them and our plans for further improvements.