DTU.dk

Adaptive Compressed Computation

Max Harry Rishøj Pedersen: Running Algorithms on Compressed Data

Massive amounts of data are being generated and stored every day, in all manner of shapes. From vacation photos being uploaded to iCloud, to Amazon tracking spending habits for every user, to the medical industry sequencing DNA in the battle against disease. To be able to store this information, it is often necessary to compress it. With the amount of data generated growing at unprecedented speeds, compression technology is ever more important.

In the simplest scenario, to be able use any part of some compressed data, it is necessary to compress it first, even if only a small part is relevant. Techniques exist that allows us to access parts of, or even run algorithms on, the data without decompressing the whole data set.

This is what we call compressed computation. Not only does this save valuable storage space (for the uncompressed data) and time (decompressing data that is not necessary), but it can even significantly

outcompete uncompressed solutions, as the amount of data to process is smaller. Compressed computation is well studied for low-repetitive data, which is data that is not very structured and thus does not compress particularly well. It is not well understood for highly repetitive data however, such as genomes where typical figures estimate that a pair of human genomes are 99.9% similar. For such data it should be possible to achieve high performance with compressed computation techniques.

Most existing methods for compressed computation focus on worst-case analysis with regards to the size of the data, which is the typical analysis for compression. For compressed computation on real world data however, this is often too pessimistic and can lead to discrepancies between theory and practice. To remedy this we aim to find parameters that allow us to do a finer analysis

that more closely match the actual performance of our algorithms. These analytical tools can then be used to shape our algorithms such that they adapt to the specific problem using these parameters.

The goal of this project is to explore this area, by utilizing fine-grained parameters to achieve more accurate analysis of existing compressed computation techniques, as well as develop new adaptive algorithms that take advantage of these tools to outperform current compressed computation techniques.

Updated on 12 May 2022