Search CORE

174,843 research outputs found

Visual perception of unitary elements for layout analysis of unconstrained documents in heterogeneous databases

Author: Coüasnon Bertrand
Lemaitre Aurélie
Poirriez Baptiste
Publication venue: HAL CCSD
Publication date: 01/09/2014
Field of study

International audienceThe document layout analysis is a complex task in the context of heterogeneous documents. It is still a challenging problem. In this paper, we present our contribution for the layout analysis competition of the international Maurdor Cam-paign. Our method is based on a grammatical description of the content of elements. It consists in iteratively finding and then removing the most structuring elements of documents. This method is based on notions of perceptive vision: a combination of points of view of the document, and the analysis of salient contents. Our description is generic enough to deal with a very wide range of heterogeneous documents. This method obtained the second place in Run 2 of Maurdor Campaign (on 1000 documents), and the best results in terms of pixel labeling for text blocs and graphic regions

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Ground Truth for Layout Analysis Performance Evaluation

Author: Antonacopoulos Apostolos
Bridson David
Karatzas Dimosthenis
Publication venue
Publication date: 01/01/2006
Field of study

Over the past two decades a significant number of layout analysis (page segmentation and region classification) approaches have been proposed in the literature. Each approach has been devised for and/or evaluated using (usually small) application-specific datasets. While the need for objective performance evaluation of layout analysis algorithms is evident, there does not exist a suitable dataset with ground truth that reflects the realities of everyday documents (widely varying layouts, complex entities, colour, noise etc.). The most significant impediment is the creation of accurate and flexible (in representation) ground truth, a task that is costly and must be carefully designed. This paper discusses the issues related to the design, representation and creation of ground truth in the context of a realistic dataset developed by the authors. The effectiveness of the ground truth discussed in this paper has been successfully shown in its use for two international page segmentation competitions (ICDAR2003 and ICDAR2005)

Southampton (e-Prints Soton)

Combining Linguistic and Spatial Information for Document Analysis

Author: Aiello Marco
Monz Christof
Todoran Leon
Publication venue
Publication date: 01/01/2000
Field of study

We present a framework to analyze color documents of complex layout. In addition, no assumption is made on the layout. Our framework combines in a content-driven bottom-up approach two different sources of information: textual and spatial. To analyze the text, shallow natural language processing tools, such as taggers and partial parsers, are used. To infer relations of the logical layout we resort to a qualitative spatial calculus closely related to Allen's calculus. We evaluate the system against documents from a color journal and present the results of extracting the reading order from the journal's pages. In this case, our analysis is successful as it extracts the intended reading order from the document.Comment: Appeared in: J. Mariani and D. Harman (Eds.) Proceedings of RIAO'2000 Content-Based Multimedia Information Access, CID, 2000. pp. 266-27

arXiv.org e-Print Archive

CiteSeerX

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

University of Groningen Digital Archive

International Migration, Integration and Social Cohesion online publications

Dissertations of the University of Groningen

Locating tables in scanned documents with heterogeneous layout

Author: Jahan Akmal, M.A.C
Publication venue: Faculty of Applied science South Eastern University of Sri Lanka Oluvil # 32360 Sri Lanka
Publication date: 01/01/2014
Field of study

The pool of knowledge available to the mankind depends on the source of learning resources, which can vary from ancient printed documents to present electronic materials. The rapid conversion of material available in traditional libraries to digital form needs a significant amount of work for format preservation. Most of the printed documents contain not only characters and its formatting but also some associated non text objects such as tables, charts and graphical objects. Since most of the existing optical character recognition techniques face challenges in detecting such objects and do not concentrate on the format preservation of the contents while reproducing them, we attempt to locate all type of tables in scanned documents with heterogeneous layout. Generally all the documents with multi columns are not purely divided by the inter column space. Long headings, centered aligned page numbers, lengthy text in headers and footer and horizontal lines extremely interfere the inter column space which was commonly used in layout analysis. To address this issue, we propose an algorithm using specific threshold to eliminate the interfering parts in inter column space and using local thresholds for word space and line height to detect and extract all categories of tables from scanned documents. From the experiment performed in 50 documents, we conclude that our algorithm has an overall accuracy of about 73% in detecting tables from multi-column layout. Even though complex layout document still have some problem, the system could treat some of these kind of documents as well. Since the algorithm does not completely depend on number of columns, inter column spaces, rule lines which bound the tables, it can detect all categories of tables in a range of different layout scanned documents

IR South Eastern University of Sri Lanka

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

Author: Barman Raphaël
Clematide Simon
Ehrmann Maud
Kaplan Frédéric
Oliveira Sofia Ares
Publication venue: 'Centre pour la Communication Scientifique Directe (CCSD)'
Publication date: 14/12/2020
Field of study

The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Episciences.org

Directory of Open Access Journals

Diffusion-based Document Layout Generation

Author: Corring John
Florencio Dinei
He Liu
Lu Yijuan
Zhang Cha
Publication venue
Publication date: 19/03/2023
Field of study

We develop a diffusion-based approach for various document layout sequence generation. Layout sequences specify the contents of a document design in an explicit format. Our novel diffusion-based approach works in the sequence domain rather than the image domain in order to permit more complex and realistic layouts. We also introduce a new metric, Document Earth Mover's Distance (Doc-EMD). By considering similarity between heterogeneous categories document designs, we handle the shortcomings of prior document metrics that only evaluate the same category of layouts. Our empirical analysis shows that our diffusion-based approach is comparable to or outperforming other previous methods for layout generation across various document datasets. Moreover, our metric is capable of differentiating documents better than previous metrics for specific cases

arXiv.org e-Print Archive

Navisio: Towards an integrated reading aid system for low vision patients

Author: Bernard Jean-Baptiste
Castet Eric
Faure Geraldine
Kornprobst Pierre
Tlapale Émilien
Publication venue: HAL CCSD
Publication date: 01/10/2008
Field of study

International audienceWe propose the Navisio software as a new integrated system to help low vision patients read complex electronic documents (here, PDF files) with more comfort. Navisio aims at taking into account main psychophysical results on reading performance of visually impaired patients. To do this, we analyze what are the main factors in uencing reading performance, and review some existing reading aid systems, dealing with printed and electronic documents. Then, we show how Navisio allows to extend the capabilities of existing reading systems, focusing on the facilitation to navigate in complex documents, and on the highly customizable display. Navisio performance was evaluated against a standard CCTV magnifier tool, with 26 low vision patients. Two kinds of texts were proposed (simple and complex documents) elaborated from a standardised text database. Results show a clear advantage of Navisio in terms of reading speed and comfort. Navisio is intended to evolve: we discuss how it could be extended to any scanned document, thanks to recent computer vision approaches in document layout analysis. Further challenging perspectives are also mentioned

HAL AMU

INRIA a CCSD electronic archive server

HAL-Ecole des Ponts ParisTech

HAL-Rennes 1

Segmentation of Document Using Discriminative Context-free Grammar Inference and Alignment Similarities

Author: Ramesh Thakur
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/04/2015
Field of study

Text Documents present a great challenge to the field of document recognition. Automatic segmentation and layout analysis of documents is used for interpretation and machine translation of documents. Document such as research papers, address book, news etc. is available in the form of un-structured format. Extracting relevant Knowledge from this document has been recognized as promising task. Extracting interesting rules form it is complex and tedious process. Conditional random fields (CRFs) utilizing contextual information, hand-coded wrappers to label the text (such as Name, Phone number and Address etc). In this paper we propose a novel approach to infer grammar rules using alignment similarity and discriminative context-free grammar. It helps in extracting desired information from the document. DOI: 10.17762/ijritcc2321-8169.160410

International Journal on Recent and Innovation Trends in Computing and Communication

Segmentation of Unstructured Newspaper Documents

Author: Dinesh R. (R)
Naik S. (Santosh)
S P. (Prabhanjan)
Publication venue: 'Arunai Publications Private Limited'
Publication date: 01/05/2017
Field of study

Document layout analysis is one of the important steps in automated document recognition systems. In Document layout analysis, meaningful information is retrieved from document images by identifying, categorizing and labeling the semantics of text blocks from the document images. In this paper, we present simple top-down approach for document page segmentation. We have tested the proposed method on unstructured documents like newspaper which is having complex structures having no fixed structure. Newspaper also has multiple titles and multiple columns. In the proposed method, white gap area which separates titles, columns of text, line of text and words in lines have been identified to separate document into various segments. The proposed algorithm has been successfully implemented and applied over a large number of Indian newspapers and the results have been evaluated by number of blocks detected and taking their correct ordering information into account

Neliti