The geometry of protein representations

 
My project arises from the problem of finding a meaningful representation of data. Single words standing for complex concepts are a nice example of representations that we learn and use throughout our life. Think of the shapes game for kids where you have to put each object in its respective hole: the adult person matches a “star” object with a “star” hole; on the other hand, the child with no access to the representation “star”, has to match a “1 2 3 4 5 spikes” object with a “1 2 3 4 5 spikes” hole. The child is clearly slower and considers the game harder.
 
Representations are very useful, but how are they even defined? Intuitively, a representation is a short informative resume, but a more formal definition is needed. Finding one is the first step in my project.
 
The current most common scientific approach relies on standard Auto Encoders, which are hourglass shaped Neural Networks. They are trained such that the reconstruction is as close as possible to the input. The inner layer representation is informative in the only sense that, starting from it, a decoder is able to reconstruct the original data. Seeking for the lowest dimensional inner layer is then what defines a short informative resume of the input data, a representation. This approach has two main drawbacks: (1) it doesn’t take into account the uncertainty and (2) it doesn’t take into account the noneuclidean geometric properties of the problem. For example, think respectively of trying to represent the content of a blurred unclear image or to project a world atlas on flat paper.
These two problems need to be tackled at the same time. Thus my project focuses on studying Bayesian Neural Networks (to tackle uncertainty) with arbitrary metric spaces (to tackle non-euclidean properties) and, mainly, how to efficiently train them.
My project is not application specific but broadly aims at developing new general purpose methods. There is no focus on specific types of datas. However, proteins are commonly accepted to be the hardest setting in this context, so that is my final goal.

PhD project

By: Marco Miani

Section: Cognitive Systems

Principal supervisor: Søren Hauberg

Co-supervisor: Wouter Krogh Boomsma

Project titleThe geometry of protein representations

Term: 15/02/2022

Contact

Marco Miani
PhD student
DTU Compute

Contact

Søren Feragen-Hauberg
Professor
DTU Compute
+45 45 25 38 99

Contact

Wouter Krogh Boomsma
Guest
DTU Compute