174,843 research outputs found

    Visual perception of unitary elements for layout analysis of unconstrained documents in heterogeneous databases

    Get PDF
    International audienceThe document layout analysis is a complex task in the context of heterogeneous documents. It is still a challenging problem. In this paper, we present our contribution for the layout analysis competition of the international Maurdor Cam-paign. Our method is based on a grammatical description of the content of elements. It consists in iteratively finding and then removing the most structuring elements of documents. This method is based on notions of perceptive vision: a combination of points of view of the document, and the analysis of salient contents. Our description is generic enough to deal with a very wide range of heterogeneous documents. This method obtained the second place in Run 2 of Maurdor Campaign (on 1000 documents), and the best results in terms of pixel labeling for text blocs and graphic regions

    Ground Truth for Layout Analysis Performance Evaluation

    No full text
    Over the past two decades a significant number of layout analysis (page segmentation and region classification) approaches have been proposed in the literature. Each approach has been devised for and/or evaluated using (usually small) application-specific datasets. While the need for objective performance evaluation of layout analysis algorithms is evident, there does not exist a suitable dataset with ground truth that reflects the realities of everyday documents (widely varying layouts, complex entities, colour, noise etc.). The most significant impediment is the creation of accurate and flexible (in representation) ground truth, a task that is costly and must be carefully designed. This paper discusses the issues related to the design, representation and creation of ground truth in the context of a realistic dataset developed by the authors. The effectiveness of the ground truth discussed in this paper has been successfully shown in its use for two international page segmentation competitions (ICDAR2003 and ICDAR2005)

    Combining Linguistic and Spatial Information for Document Analysis

    Get PDF
    We present a framework to analyze color documents of complex layout. In addition, no assumption is made on the layout. Our framework combines in a content-driven bottom-up approach two different sources of information: textual and spatial. To analyze the text, shallow natural language processing tools, such as taggers and partial parsers, are used. To infer relations of the logical layout we resort to a qualitative spatial calculus closely related to Allen's calculus. We evaluate the system against documents from a color journal and present the results of extracting the reading order from the journal's pages. In this case, our analysis is successful as it extracts the intended reading order from the document.Comment: Appeared in: J. Mariani and D. Harman (Eds.) Proceedings of RIAO'2000 Content-Based Multimedia Information Access, CID, 2000. pp. 266-27

    Locating tables in scanned documents with heterogeneous layout

    Get PDF
    The pool of knowledge available to the mankind depends on the source of learning resources, which can vary from ancient printed documents to present electronic materials. The rapid conversion of material available in traditional libraries to digital form needs a significant amount of work for format preservation. Most of the printed documents contain not only characters and its formatting but also some associated non text objects such as tables, charts and graphical objects. Since most of the existing optical character recognition techniques face challenges in detecting such objects and do not concentrate on the format preservation of the contents while reproducing them, we attempt to locate all type of tables in scanned documents with heterogeneous layout. Generally all the documents with multi columns are not purely divided by the inter column space. Long headings, centered aligned page numbers, lengthy text in headers and footer and horizontal lines extremely interfere the inter column space which was commonly used in layout analysis. To address this issue, we propose an algorithm using specific threshold to eliminate the interfering parts in inter column space and using local thresholds for word space and line height to detect and extract all categories of tables from scanned documents. From the experiment performed in 50 documents, we conclude that our algorithm has an overall accuracy of about 73% in detecting tables from multi-column layout. Even though complex layout document still have some problem, the system could treat some of these kind of documents as well. Since the algorithm does not completely depend on number of columns, inter column spaces, rule lines which bound the tables, it can detect all categories of tables in a range of different layout scanned documents

    Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

    Full text link
    The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance

    Diffusion-based Document Layout Generation

    Full text link
    We develop a diffusion-based approach for various document layout sequence generation. Layout sequences specify the contents of a document design in an explicit format. Our novel diffusion-based approach works in the sequence domain rather than the image domain in order to permit more complex and realistic layouts. We also introduce a new metric, Document Earth Mover's Distance (Doc-EMD). By considering similarity between heterogeneous categories document designs, we handle the shortcomings of prior document metrics that only evaluate the same category of layouts. Our empirical analysis shows that our diffusion-based approach is comparable to or outperforming other previous methods for layout generation across various document datasets. Moreover, our metric is capable of differentiating documents better than previous metrics for specific cases

    Navisio: Towards an integrated reading aid system for low vision patients

    Get PDF
    International audienceWe propose the Navisio software as a new integrated system to help low vision patients read complex electronic documents (here, PDF files) with more comfort. Navisio aims at taking into account main psychophysical results on reading performance of visually impaired patients. To do this, we analyze what are the main factors in uencing reading performance, and review some existing reading aid systems, dealing with printed and electronic documents. Then, we show how Navisio allows to extend the capabilities of existing reading systems, focusing on the facilitation to navigate in complex documents, and on the highly customizable display. Navisio performance was evaluated against a standard CCTV magnifier tool, with 26 low vision patients. Two kinds of texts were proposed (simple and complex documents) elaborated from a standardised text database. Results show a clear advantage of Navisio in terms of reading speed and comfort. Navisio is intended to evolve: we discuss how it could be extended to any scanned document, thanks to recent computer vision approaches in document layout analysis. Further challenging perspectives are also mentioned

    Segmentation of Document Using Discriminative Context-free Grammar Inference and Alignment Similarities

    Get PDF
    Text Documents present a great challenge to the field of document recognition. Automatic segmentation and layout analysis of documents is used for interpretation and machine translation of documents. Document such as research papers, address book, news etc. is available in the form of un-structured format. Extracting relevant Knowledge from this document has been recognized as promising task. Extracting interesting rules form it is complex and tedious process. Conditional random fields (CRFs) utilizing contextual information, hand-coded wrappers to label the text (such as Name, Phone number and Address etc). In this paper we propose a novel approach to infer grammar rules using alignment similarity and discriminative context-free grammar. It helps in extracting desired information from the document. DOI: 10.17762/ijritcc2321-8169.160410

    Segmentation of Unstructured Newspaper Documents

    Full text link
    Document layout analysis is one of the important steps in automated document recognition systems. In Document layout analysis, meaningful information is retrieved from document images by identifying, categorizing and labeling the semantics of text blocks from the document images. In this paper, we present simple top-down approach for document page segmentation. We have tested the proposed method on unstructured documents like newspaper which is having complex structures having no fixed structure. Newspaper also has multiple titles and multiple columns. In the proposed method, white gap area which separates titles, columns of text, line of text and words in lines have been identified to separate document into various segments. The proposed algorithm has been successfully implemented and applied over a large number of Indian newspapers and the results have been evaluated by number of blocks detected and taking their correct ordering information into account
    corecore