DTU.dk

Deep Generative Models for Collaborative Filtering

Federico Bergamin: Generative Models with Sparse and Missing Data

In the last few years, we witnessed the success of machine learning in a lot of different fields, ranging from computer vision to speech recognition and natural language processing. This was due to availability of massive datasets and the exponential growth of computational power. However, a lot of real-world applications have to deal with messy datasets, such as datasets that are super sparse, i.e. datasets characterized by observations with only a small number of features with a non-zero value.

Sparse, high-dimensional data are widespread. They can be found when working with bag-of-words representation of text data, user-item interaction matrices in recommender systems, in surveys and in other interaction matrices that can be found in life science, such as gene expression and drug sensitivity matrices. While in some cases the 0 entries in the observations represent some actual data, in other setting a 0 can be interpreted as a missing value. This second scenario is the one we will be focusing on.

If the goal is to predict the 0 entries in our data, we have that the problem is closely related to missing data imputation. Therefore, a way to approach it is to use a probabilistic model to learn the data distribution and use that to predict the missing values. However, in this setting, classical machine learning methods are still preferred to deep probabilistic models. This due the fact that the learning signal will depend only on the rare non-zero features and therefore standard training algorithms for deep probabilistic models will likely lead to underfitting.

In this project we aim to solve these problems. We will mostly focus on high-dimensional sparse interaction matrices and our goals are the following:

Create methods that helps the training process of deep probabilistic models when working with sparse datasets;
Propose new architecture for deep probabilistic models in the setting we have described. Indeed, in an interaction matrix, both the rows and the column have a specific meaning. For example, users and items in recommender systems. However, most models in the literature either model the rows or the column. We expect that by modelling both the rows and the columns jointly should help our model in learning a better data distribution. We will use tis novel architecture to perform various tasks, including co-clustering, i.e. simultaneous clustering of both the rows and the columns, and missing data imputation.

Updated on 12 May 2022