In industrial supply chain and logistics applications, edge IoT devices capture data continuously, generating massive amounts of data. For embedded vision systems, managing the sheer volume of images and metadata can be challenging. Selecting a diverse subset of high-quality data is crucial for effective modeling and analysis. In this presentation, we’ll share a comprehensive method for selecting relevant images from an extensive dataset to create a high-quality image database that enables building and monitoring computer vision and machine learning models. This systematic approach not only enhances the efficiency of data management in industrial IoT applications but also improves the generalizability and accuracy of computer vision models.