In scientific papers, computer vision models are usually evaluated on well-defined training and test datasets. In practice, however, collecting high-quality data that accurately represents the real world is a challenging problem. Developing models using a non-representative dataset will give high accuracy during testing, but the model will perform poorly when deployed in the real world. In this session, we will discuss the challenges, common pitfalls and possible solutions for creating datasets for real-world problems. We will discuss how to avoid typical biases while curating the data and will dive deep into imbalanced distributions and present techniques on how to handle them. Finally, we will discuss strategies to detect and deal with model drift after a model is deployed in production.