10 research outputs found
Adaptive Methods for Robust Document Image Understanding
A vast amount of digital document material is continuously being produced as part of major digitization efforts around the world. In this context, generic and efficient automatic solutions for document image understanding represent a stringent necessity. We propose a generic framework for document image understanding systems, usable for practically any document types available in digital form. Following the introduced workflow, we shift our attention to each of the following processing stages in turn: quality assurance, image enhancement, color reduction and binarization, skew and orientation detection, page segmentation and logical layout analysis. We review the state of the art in each area, identify current defficiencies, point out promising directions and give specific guidelines for future investigation. We address some of the identified issues by means of novel algorithmic solutions putting special focus on generality, computational efficiency and the exploitation of all available sources of information. More specifically, we introduce the following original methods: a fully automatic detection of color reference targets in digitized material, accurate foreground extraction from color historical documents, font enhancement for hot metal typesetted prints, a theoretically optimal solution for the document binarization problem from both computational complexity- and threshold selection point of view, a layout-independent skew and orientation detection, a robust and versatile page segmentation method, a semi-automatic front page detection algorithm and a complete framework for article segmentation in periodical publications. The proposed methods are experimentally evaluated on large datasets consisting of real-life heterogeneous document scans. The obtained results show that a document understanding system combining these modules is able to robustly process a wide variety of documents with good overall accuracy
Adaptive methods for robust document image understanding
A vast amount of digital document material is continuously being produced as part of major digitization efforts around the world. In this context, generic and efficient automatic solutions for document image understanding represent a stringent necessity. We propose a generic framework for document image understanding systems, usable for practically any document types available in digital form. Following the introduced workflow, we shift our attention to each of the following processing stages in turn: quality assurance, image enhancement, color reduction and binarization, skew and orientation detection, page segmentation and logical layout analysis. We review the state of the art in each area, identify current deficiencies, point out promising directions and give specific guidelines for future investigation. We address some of the identified issues by means of novel algorithmic solutions putting special focus on generality, computational efficiency and the exploitation of all available sources of information. More specifically, we introduce the following original methods: a fully automatic detection of color reference targets in digitized material, accurate foreground extraction from color historical documents, font enhancement for hot metal typesetted prints, a theoretically optimal solution for the document binarization problem from both computational complexity- and threshold selection point of view, a layout-independent skew and orientation detection, a robust and versatile page segmentation method, a semi-automatic front page detection algorithm and a complete framework for article segmentation in periodical publications. The proposed methods are experimentally evaluated on large datasets consisting of real-life heterogeneous document scans. The obtained results show that a document understanding system combining these modules is able to robustly process a wide variety of documents with good overall accuracy
Confidence measures for seamless skew and orientation detection in document images
Document deskewing is a crucial pre-processing step in any document analysis system. In mass digitization projects, the ability to automatically assess the success of this step enables significant reductions in the amount of work performed by human operators. The current paper extends our generic skew and orientation detection algorithm with the ability to return well-founded confidence values for the detected rotation angles, which in turn may lie anywhere within the range of -180 to 180 degrees. Starting with an in-depth theoretical analysis of all common situations occurring in digitized document images, including large non-text regions as well as noise patterns, we derive a formula for the expected distribution of the global skew angle. Using the results of the analysis, we propose two generic confidence measures which are able to accurately reflect and cover all the aforementioned situations. Finally, the introduced confidence measures are tested on the UW-I data set comprising 979 heterogeneous document scans
Histograms of Stroke Widths for Multi-script Text Detection and Verification in Road Scenes
Robust text detection and recognition in arbitrarily distributed, unrestricted images is a difficult problem, e.g. when interpreting traffic panels outdoors during autonomous driving. Most previous work in text detection considers only a single script, usually Latin, and it is not able to detect text with multiple scripts. Our contribution combines an established technique -Maximum Stable Extremal Regions-with a histogram of stroke width (HSW) feature and a Support Vector Machine classifier. We combined characters into groups by raycasting and merged aligned groups into lines of text that can also be verified by using the HSW. We evaluated our detection pipeline on our own dataset of road scenes from Autobahn (German Highways), and show how the character classifier stage can be trained with one script and be successfully tested on a different one. While precision and recall match to state of the art solution
A new framework for automatic quality assessment of print media
Print media collections of considerable size are held by cultural heritage organizations and will soon be subject to digitization activities. However, technical content quality management in digitization workflows strongly relies on human monitoring. This heavy human intervention is cost intensive and time consuming, which makes automization mandatory. In this paper, a new automatic quality assessment framework is proposed. The digitized source image and a color reference target are extracted from the raw digitized images by an automatic segmentation process. The target is evaluated by a reference-based algorithm. No-reference quality metrics are applied to the source image. Experimental results are provided to illustrate the performance of the proposed framework. We show that our approach yields a significant improvement in the extraction as well as in the quality assessment step compared to the state-of-the-art