The ongoing revolution in artificial intelligence is driven by the availability of large annotated data sets. Unfortunately, not all domains of science are as data-rich as needed, and this limits the effectiveness of algorithms. In the medical domain, for instance, data is often expensive to both acquire and annotate, and the amount of data is limited by the number of affected patients. We often find that data can be lacking in more subtle ways. When one group of a studied population is underrepresented, many algorithms will systematically enforce this bias.
While this project focuses on the medical domain, the following well-known example illustrates the problem: In 2016, the wage gap between males and females across the European Union was 16.2%. Datta et al. (2015) have demonstrated how this existing bias is picked up and propagated by Google’s ad serving algorithm such that higher paying jobs are advertised less to female job seekers, thereby reenforcing existing bias.
We combat both lack of data and bias with data augmentation, i.e. generating new artificial data points from existing ones. Given a minority group in our dataset, we aim to create realistic new examples for this group, thereby correcting the underrepresentation. As a consequence, we hope to alleviate the biases resulting from underrepresentation, aiming for fair and equally accurate algorithms across all demographics.