Life2Vec: Numerical Representations of Social Behaviour

Germans Savcisens: What if we can summarize one’s life based on the sequence of the past events? In that case, will predicting one’s future still be off limits?

Nowadays, almost every aspect of our daily life is digitalised. As we browse the internet, pay with our debit cards, we leave bits and pieces of information. The public sector is of no exception as information about one's education, health, work, taxes is recorded by different public institutions throughout life. Research within healthcare and machine learning showed that the use of health data alone could result in powerful models for predicting future health-related outcomes, such as early detection of heart failure, side effects of prescribed medication and in-hospital mortality. However, not many studies are focused on models that go beyond the health records.

The typical strategy of analysing health records draws on principles of Natural Language Processing, where meanings of words are represented as numerical vectors. The methods that come up with the numerical representation (i.e. embedding) learn them based on the context in which words appear. When successful, semantics of words are expressed with numbers, so that we can use numerical representations of words to write the following equation: 'king' - 'man' + 'woman' = 'queen'. Similarly, it is done with the health terms (e.g. diagnoses, procedures, lab results), where numerical values are assigned to different terms. The numerical representations are then used to predict outcomes of interest or even create a numerical representation of patients.

Embeddings of life-events are conceptually like embeddings of health records, but vastly more interesting as they contain information on most important events in human lives beyond health. It may include data about education, taxes, interactions with the municipality services and so on. Hence, it is an excellent source for creating methods that can build numerical representations of one's behaviour. Those can be further used to predict a person's retirement age, identify tax frauds, foresee crimes etc.

Throughout the project, we are going to work with the data provided by Denmark Statistics, which contains sequences of events from millions of Denmark's residents. Firstly, we are going to develop models for representing social behaviour. Those are going to be based on traditional machine learning techniques, as well as deep recurrent neural networks, convolution neural networks and transformers. Secondly, we are going to evaluate how useful these representations are for predicting life outcomes (such as education levels, income and wealth ranks, unemployment histories). Due to the sensitive nature of our research, we will make it a top priority to develop interpretable and secure models. So that the personal data is handled according to the latest state legislations and models provide clear reasoning behind each prediction.

PhD project

By: Germans Savcisens

Section: Cognitive Systems

Principal supervisor: Sune Lehmann Jørgensen

Co-supervisor: Lars Kai Hansen

Project title: Life2Vec: Numerical Representations of Social Behaviour

Term: 01/09/2020 → 31/08/2023

Contact

Germans Savcisens
Guest
DTU Compute

Contact

Sune Lehmann
Professor
DTU Compute
+45 45 25 39 04

Contact

Lars Kai Hansen
Professor, head of section
DTU Compute
+45 45 25 38 89