Date: Wednesday, May 26
Start Time: 12:00 pm
End Time: 12:30 pm
Many applications, such as medical imaging, lack the large amounts of data required for training popular CNNs to achieve sufficient accuracy. Often, these same applications suffer from an imbalanced class distribution problem that negatively impacts model accuracy. In this talk, we propose a highly data-efficient methodology that can achieve the same level of accuracy using significantly fewer labeled images and is insensitive to class imbalance. The approach is based on a training pipeline with two components: a CNN trained in an unsupervised setting for image feature representation generation, and a multiclass Gaussian process classifier, trained in active learning cycles, using the image representations with labels. We demonstrate our approach with a COVID-19 chest X-ray classifier solution where data is scarce and highly imbalanced. We show that the approach is insensitive to class imbalance and achieves comparable accuracy to prior approaches while using only a fraction of the training data.