Big Data Modelling with Applications to Airports

Agnes Martine Nielsen: Statistical methods are often motivated by real problems. We consider methods inspired by problems in biology and medicine.

The thesis is in two parts. In the first part we consider data in the form of graphs (or networks). These occur naturally in many contexts such as social and biological networks. We specifically consider the setting where we have multiple graphs on the same set of nodes. We propose a model in this setting called the multiple random dot product graph model and an algorithm for fitting the model. We also propose a hypothesis test in the model framework for whether two graphs are drawn from the same distribution. The model is generalized to weighted graphs, which means each edge in the graph has an associated weight. We specifically consider Poisson and normally distributed weights. Similar hypothesis tests are proposed in these settings and the performance is again evaluated through simulation studies.

The second part of the thesis considers the prediction of disease progression. We compare three common approaches for disease prediction and apply them to a diabetes data set. In this data, the time until a patient goes on to insulin treatment is of interest - especially whether progression is fast or slow. The methods are: A Cox proportional hazards model, a random forest method for survival data, and a neural network approach. The prediction performance, and the pros and cons of the methods are discussed.

PhD project Agnes Martine Nielsen

Section: Statistics and data analysis

Principal supervisor: Line K.H. Clemmensen
Co-supervisors: Anders B. Dahl, Bjarne K. Ersbøll

Title of project: Big Data Modelling with Applications to Airports

Published as PhD report: Statistical Learning with Applications in Biology

Effective start/end date 01/08/2015 → 31/12/2018

Contact