Towards end-to-end document analysis

Rasmus Berg Palm: There's a fundamental problem with computers and humans; humans like to communicate using human language, and computers don't understand it.

Either we, the humans can start to speak computer or we can try to teach the computers to understand our language. This project attempts the latter. A particular form of human communication that has been perfected over the last hundreds of years is the document. A couple of pages of text, a fancy layout, maybe some tables, and maybe even an illustration, designed to convey some information. My project goal is to develop methods for extracting the relevant information from such documents using state of the art deep learning methods. The project is sponsored by the Innovation fund and Tradeshift which has a large and growing collection of business documents both in the human readable form, and in the computer readable form.

Funding: Innovation Fund Denmark and Tradeshift

PhD project by Rasmus Berg Palm

Research section: Cognitive Systems

Principal supervisor: Ole Winther
Co-supervisors: Florian Laws, Morten Mørup

Title of project: Towards end-to-end document analysis

Effective start/end date 01/11/2015 → 31/10/2018

PhD report published: End-to-end information extraction from business documents

Contact

Ole Winther
Professor
DTU Compute
+45 45 25 38 95

Contact

Morten Mørup
Professor
DTU Compute
+45 45 25 39 00