research

Reconstructing financial statements

Abstract

This paper introduces a tool for the reconstruction and validation of categorized totals embedded in untrusted and unformatted text, such as OCR scans of nancial statements. The tool is a spino of academic research into the funding of Japanese third-sector organizations, the annual reports of which are frequently published reports in the form of PDF les containing document images. A number of techniques at string- line- and document-level are used to resolve ambiguities and obtain the greatest possible recovery rate for the underlying data, while excluding the content of untrustworthy documents from the nal sample. In a preliminary trial \in the wild", the tool has returned validated income totals for 47.9% of the documents in a heterogeous set of 2205 annual reports

    Similar works