This paper introduces a tool for the reconstruction and validation of categorized totals embedded in untrusted and unformatted
text, such as OCR scans of nancial statements. The tool is a spino
of academic research into the funding of Japanese third-sector organizations, the annual reports of which are frequently published reports in the
form of PDF les containing document images. A number of techniques
at string- line- and document-level are used to resolve ambiguities and
obtain the greatest possible recovery rate for the underlying data, while
excluding the content of untrustworthy documents from the nal sample. In a preliminary trial \in the wild", the tool has returned validated
income totals for 47.9% of the documents in a heterogeous set of 2205
annual reports