174,843 research outputs found
Visual perception of unitary elements for layout analysis of unconstrained documents in heterogeneous databases
International audienceThe document layout analysis is a complex task in the context of heterogeneous documents. It is still a challenging problem. In this paper, we present our contribution for the layout analysis competition of the international Maurdor Cam-paign. Our method is based on a grammatical description of the content of elements. It consists in iteratively finding and then removing the most structuring elements of documents. This method is based on notions of perceptive vision: a combination of points of view of the document, and the analysis of salient contents. Our description is generic enough to deal with a very wide range of heterogeneous documents. This method obtained the second place in Run 2 of Maurdor Campaign (on 1000 documents), and the best results in terms of pixel labeling for text blocs and graphic regions
Ground Truth for Layout Analysis Performance Evaluation
Over the past two decades a significant number of layout analysis (page segmentation and region classification) approaches have been proposed in the literature. Each approach has been devised for and/or evaluated using (usually small) application-specific datasets. While the need for objective performance evaluation of layout analysis algorithms is evident, there does not exist a suitable dataset with ground truth that reflects the realities of everyday documents (widely varying layouts, complex entities, colour, noise etc.). The most significant impediment is the creation of accurate and flexible (in representation) ground truth, a task that is costly and must be carefully designed. This paper discusses the issues related to the design, representation and creation of ground truth in the context of a realistic dataset developed by the authors. The effectiveness of the ground truth discussed in this paper has been successfully shown in its use for two international page segmentation competitions (ICDAR2003 and ICDAR2005)
Combining Linguistic and Spatial Information for Document Analysis
We present a framework to analyze color documents of complex layout. In
addition, no assumption is made on the layout. Our framework combines in a
content-driven bottom-up approach two different sources of information: textual
and spatial. To analyze the text, shallow natural language processing tools,
such as taggers and partial parsers, are used. To infer relations of the
logical layout we resort to a qualitative spatial calculus closely related to
Allen's calculus. We evaluate the system against documents from a color journal
and present the results of extracting the reading order from the journal's
pages. In this case, our analysis is successful as it extracts the intended
reading order from the document.Comment: Appeared in: J. Mariani and D. Harman (Eds.) Proceedings of RIAO'2000
Content-Based Multimedia Information Access, CID, 2000. pp. 266-27
Locating tables in scanned documents with heterogeneous layout
The pool of knowledge available to the mankind depends on the source of learning
resources, which can vary from ancient printed documents to present electronic
materials. The rapid conversion of material available in traditional libraries to digital
form needs a significant amount of work for format preservation. Most of the printed
documents contain not only characters and its formatting but also some associated non
text objects such as tables, charts and graphical objects. Since most of the existing
optical character recognition techniques face challenges in detecting such objects and
do not concentrate on the format preservation of the contents while reproducing them,
we attempt to locate all type of tables in scanned documents with heterogeneous layout.
Generally all the documents with multi columns are not purely divided by the inter
column space. Long headings, centered aligned page numbers, lengthy text in headers
and footer and horizontal lines extremely interfere the inter column space which was
commonly used in layout analysis. To address this issue, we propose an algorithm
using specific threshold to eliminate the interfering parts in inter column space and
using local thresholds for word space and line height to detect and extract all categories
of tables from scanned documents. From the experiment performed in 50 documents,
we conclude that our algorithm has an overall accuracy of about 73% in detecting
tables from multi-column layout. Even though complex layout document still have
some problem, the system could treat some of these kind of documents as well. Since
the algorithm does not completely depend on number of columns, inter column spaces,
rule lines which bound the tables, it can detect all categories of tables in a range of
different layout scanned documents
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
The massive amounts of digitized historical documents acquired over the last
decades naturally lend themselves to automatic processing and exploration.
Research work seeking to automatically process facsimiles and extract
information thereby are multiplying with, as a first essential step, document
layout analysis. If the identification and categorization of segments of
interest in document images have seen significant progress over the last years
thanks to deep learning techniques, many challenges remain with, among others,
the use of finer-grained segmentation typologies and the consideration of
complex, heterogeneous documents such as historical newspapers. Besides, most
approaches consider visual features only, ignoring textual signal. In this
context, we introduce a multimodal approach for the semantic segmentation of
historical newspapers that combines visual and textual features. Based on a
series of experiments on diachronic Swiss and Luxembourgish newspapers, we
investigate, among others, the predictive power of visual and textual features
and their capacity to generalize across time and sources. Results show
consistent improvement of multimodal models in comparison to a strong visual
baseline, as well as better robustness to high material variance
Diffusion-based Document Layout Generation
We develop a diffusion-based approach for various document layout sequence
generation. Layout sequences specify the contents of a document design in an
explicit format. Our novel diffusion-based approach works in the sequence
domain rather than the image domain in order to permit more complex and
realistic layouts. We also introduce a new metric, Document Earth Mover's
Distance (Doc-EMD). By considering similarity between heterogeneous categories
document designs, we handle the shortcomings of prior document metrics that
only evaluate the same category of layouts. Our empirical analysis shows that
our diffusion-based approach is comparable to or outperforming other previous
methods for layout generation across various document datasets. Moreover, our
metric is capable of differentiating documents better than previous metrics for
specific cases
Navisio: Towards an integrated reading aid system for low vision patients
International audienceWe propose the Navisio software as a new integrated system to help low vision patients read complex electronic documents (here, PDF files) with more comfort. Navisio aims at taking into account main psychophysical results on reading performance of visually impaired patients. To do this, we analyze what are the main factors in uencing reading performance, and review some existing reading aid systems, dealing with printed and electronic documents. Then, we show how Navisio allows to extend the capabilities of existing reading systems, focusing on the facilitation to navigate in complex documents, and on the highly customizable display. Navisio performance was evaluated against a standard CCTV magnifier tool, with 26 low vision patients. Two kinds of texts were proposed (simple and complex documents) elaborated from a standardised text database. Results show a clear advantage of Navisio in terms of reading speed and comfort. Navisio is intended to evolve: we discuss how it could be extended to any scanned document, thanks to recent computer vision approaches in document layout analysis. Further challenging perspectives are also mentioned
Segmentation of Document Using Discriminative Context-free Grammar Inference and Alignment Similarities
Text Documents present a great challenge to the field of document recognition. Automatic segmentation and layout analysis of documents is used for interpretation and machine translation of documents. Document such as research papers, address book, news etc. is available in the form of un-structured format. Extracting relevant Knowledge from this document has been recognized as promising task. Extracting interesting rules form it is complex and tedious process. Conditional random fields (CRFs) utilizing contextual information, hand-coded wrappers to label the text (such as Name, Phone number and Address etc). In this paper we propose a novel approach to infer grammar rules using alignment similarity and discriminative context-free grammar. It helps in extracting desired information from the document.
DOI: 10.17762/ijritcc2321-8169.160410
Segmentation of Unstructured Newspaper Documents
Document layout analysis is one of the important steps in automated document recognition systems. In Document layout analysis, meaningful information is retrieved from document images by identifying, categorizing and labeling the semantics of text blocks from the document images. In this paper, we present simple top-down approach for document page segmentation. We have tested the proposed method on unstructured documents like newspaper which is having complex structures having no fixed structure. Newspaper also has multiple titles and multiple columns. In the proposed method, white gap area which separates titles, columns of text, line of text and words in lines have been identified to separate document into various segments. The proposed algorithm has been successfully implemented and applied over a large number of Indian newspapers and the results have been evaluated by number of blocks detected and taking their correct ordering information into account
- …