Geometric Deep Learning on Protein Sequences

 

It takes upto billion dollars to get a new drug to market. Vast majority of this is spent on the research and development part of the process. In recent years, we have seen state-of-the-art performance of Deep Learning Algorithms. Many high-dimensional learning tasks previously thought to be beyond reach – such as computer vision, playing Go, or protein folding – are in fact feasible with appropriate computational scale. In light of this, we would like to explore the possibility of modelling drug-protein interactions using deep learning, which will allow us to replace expensive experiments with computer simulations. This will significantly shorten the research cycle and save millions of dollars and possibly allow us find new cures for diseases.

Geometric Deep Learning can further improve the performance of standard Deep learning methods in these setting. In general, the assumption we work with is that our data lies on the Euclidean space, and our target functions map points of high-dimensional euclidean space to a low dimensional euclidean space. Exploiting the geometric structure of the problem, such as symmetry, geometric stability and scale separation, can often lead to vast improvements in performance. Geometric Deep learning is a project to construct a unified theoretical framework that unifies various representation learning architectures. Protein Sequences are amenable to this theoretical perspective because they have extremely rich geometric structure. They have an internal temporal, as well as semantic structure. Node features often include geometric information.

In my PhD project, I will try to use techniques from geometric deep learning, such as graph neural networks and equivariant convolutions, to exploit the geometric information present in the protein sequence data.

The image is from a paper on learning meaningful representations of protein sequences. It shows how the distance and geodesics learned using geometric deep learning techniques correspond to the phylogenetic tree of those protein sequences.

PhD project

ByHrittik Roy

Section: Cognitive Systems

Principal supervisor: Søren Hauberg

Co-supervisor: Jes Frellsen

Project title: Geometric Representation Learning for Protein Sequences

Term:

Contact

Hrittik Roy
PhD student
DTU Compute

Contact

Søren Hauberg
Professor
DTU Compute
+45 45 25 38 99

Contact

Jes Frellsen
Associate Professor
DTU Compute
+45 45 25 39 23