Explainability of uncertainty for neural networks

Laura Rieger: Neural networks, a wildly successful machine learning algorithm, are used more and more often for critical applications where getting the correct outcome is important and mistakes could endanger humans. Popular usage examples are self-driving cars, automatic speech recognition and and image captioning.

Unfortunately we have almost no insight into how a neural network comes to its decision. It does not “show its work” and would therefore fail every university exam. If a neural network makes a fatal mistake, we need to know why that happened. If we doubt a neural network’s decision, it is helpful to look at the reasoning for the decision. If we use the neural network decision along with other input, e.g. to support a medical diagnosis, we need to know what this decision is based on. Therefore my project aims to open up the black box a neural network represents by looking at the interpretability of its decisions in the context of uncertainty. By doing this we can see how different features of the input influence the certainty of the output decision. This will help to know if we should conduct more medical tests to solidify a health diagnosis, how vulnerable a system is to malicious attacks and how to defend against them and whether a system has an unwanted discriminative bias.

To achieve this I will start by applying current methods for interpreting neural network decisions on Bayesian neural networks and training neural networks on different subsets of datasets to gain insight in the convergence of the learning process. The success of the applied methods can be examined by comparing the neural network explanation to reasoning and attention on the same decision by humans.

Updated on 7 June 2022