Discovering Polytopes in High Dimensional, Heterogeneous Datasets using Bayesian Arche-typal Analysis


Real World Data (RWD), such as electronic medical records, national health registries, and insuranceclaims data provide vast amounts of high granularity heterogeneous data. An in-ternational standard(OMOP) has been developed for health data and accelerating evidence generation from RWD. EU hasrecently adopted the same standard for the European Health Data & Evidence Network (EHDEN), thelargest federated health data network covering more than 500 million patient records.

This allowsstandardization of datasets across institutions in 26 different countries, but a ma-jor data sciencechallenge remains on how to tackle the volume and complexity of multi-modal data of suchmagnitude. The aim is to develop easily human interpretable tools to an-alyzeRWD to extract distinctcharacteristics enabling new discoveries.

The project will focus on a prominent data science methodology calledArchetypal Analysischaracterized by identifying distinct characteristics, archetypes, and how observations are describedin terms of these archetypes(see figure), thereby defining polytopes in high-di-mensional data. This project willdevelop tools for uncovering such polytopes in large, high-dimensional, heterogenous, noisy, andincomplete data. We will develop Bayesian modeling approaches for uncertainty and complexitycharacterization, data fusion for enhanced infer-ence, and deep learning methods to uncoverdisentangled polytopes.

The tool will advance our understanding of RWD and will accelerate realworld evidence gen-eration through the identification of patterns in terms of archetypes. Furthermore,trade-offs within archetypes can fuel personalized medicine by defining a profile of the individualpatient in terms of a soft assigned spectrum between archetypes. We hypothesize thischar-acterization has important use advancing our understanding of subtypes and comorbidities withindifferent neurological and psychiatric disorders.

The ability to learn from RWD the defining polytopes can provide important compact and humaninterpretable characterizations of biological systems and further our understanding of these complexsystems in general. Notably, the tools developed will account for uncer-tainty using probabilisticmodelling and visualize results and their associated uncertainties in simple and understandable ways.Importantly, the tools will be compatible with the OMOP CDM.

PhD project

By: Anna Emilie Jennow Wedenborg

Section: Cognitive Systems

Principal supervisor: Morten Mørup

Co-supervisor: Christian Laut Ebbesen

Project title: Discovering Polytopes in High Dimensional, Heterogeneous Datasets using Bayesian Arche-typal Analysis

Term: 15-09-2022


Anna Emilie Jennow Wedenborg
PhD student
DTU Compute


Morten Mørup
DTU Compute
+45 45 25 39 00