Learning Data Augmentations for Bias-Correction

Pola Elisabeth Schwöbel: Bias in, Fairness out: Learning Data Augmentation for Bias-Correction.

The ongoing revolution in artificial intelligence is driven by the availability of large annotated data sets. Unfortunately, not all domains of science are as data-rich as needed, and this limits the effectiveness of algorithms. In the medical domain, for instance, data is often expensive to both acquire and annotate, and the amount of data is limited by the number of affected patients. We often find that data can be lacking in more subtle ways. When one group of a studied population is underrepresented, many algorithms will systematically enforce this bias.

While this project focuses on the medical domain, the following well-known example illustrates the problem: In 2016, the wage gap between males and females across the European Union was 16.2%. Datta et al. (2015) have demonstrated how this existing bias is picked up and propagated by Google’s ad serving algorithm such that higher paying jobs are advertised less to female job seekers, thereby reenforcing existing bias.

We combat both lack of data and bias with data augmentation, i.e. generating new artificial data points from existing ones. Given a minority group in our dataset, we aim to create realistic new examples for this group, thereby correcting the underrepresentation. As a consequence, we hope to alleviate the biases resulting from underrepresentation, aiming for fair and equally accurate algorithms across all demographics.

PhD project

By: Pola Elisabeth Schwöbel

Section: Cognitive Systems

Principal supervisor: Søren Hauberg

Co-supervisor: Kristoffer Hougaard Madsen

Project title: Learning Data Augmentations for Bias-Correction

Term: 01/01/2019 → 07/06/2022


Søren Feragen-Hauberg
DTU Compute
+45 45 25 38 99