Search CORE

9 research outputs found

Combining Linguistic and Spatial Information for Document Analysis

Author: Aiello Marco
Monz Christof
Todoran Leon
Publication venue: University of Groningen, Johann Bernoulli Institute for Mathematics and Computer Science
Publication date: 01/01/2000
Field of study

ARTS repository - University of Groningen

Combining Linguistic and Spatial Information for Document Analysis

Author: Aiello Marco
Monz Christof
Todoran Leon
Publication venue
Publication date: 01/01/2000
Field of study

We present a framework to analyze color documents of complex layout. In addition, no assumption is made on the layout. Our framework combines in a content-driven bottom-up approach two different sources of information: textual and spatial. To analyze the text, shallow natural language processing tools, such as taggers and partial parsers, are used. To infer relations of the logical layout we resort to a qualitative spatial calculus closely related to Allen's calculus. We evaluate the system against documents from a color journal and present the results of extracting the reading order from the journal's pages. In this case, our analysis is successful as it extracts the intended reading order from the document.Comment: Appeared in: J. Mariani and D. Harman (Eds.) Proceedings of RIAO'2000 Content-Based Multimedia Information Access, CID, 2000. pp. 266-27

arXiv.org e-Print Archive

CiteSeerX

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

University of Groningen Digital Archive

International Migration, Integration and Social Cohesion online publications

Dissertations of the University of Groningen

Thick 2D Relations for Document Understanding

Author: Aiello Marco
Smeulders Arnold M.W.
Publication venue
Publication date: 01/01/2002
Field of study

We use a propositional language of qualitative rectangle relations to detect the reading order from document images. To this end, we define the notion of a document encoding rule and we analyze possible formalisms to express document encoding rules such as LATEX and SGML. Document encoding rules expressed in the propositional language of rectangles are used to build a reading order detector for document images. In order to achieve robustness and avoid brittleness when applying the system to real life document images, the notion of a thick boundary interpretation for a qualitative relation is introduced. The framework is tested on a collection of heterogeneous document images showing recall rates up to 89%

Unitn-eprints Research

Automated Problem Domain Cognition Process in Information Systems Design

Author: Loginov Maxim
Mikov Alexander
Publication venue: Institute of Information Theories and Applications FOI ITHEA
Publication date: 01/01/2007
Field of study

An automated cognitive approach for the design of Information Systems is presented. It is supposed to be used at the very beginning of the design process, between the stages of requirements determination and analysis, including the stage of analysis. In the context of the approach used either UML or ERD notations may be used for model representation. The approach provides the opportunity of using natural language text documents as a source of knowledge for automated problem domain model generation. It also simplifies the process of modelling by assisting the human user during the whole period of working upon the model (using UML or ERD notations)

Bulgarian Digital Mathematics Library at IMI-BAS

Logical Structure Detection for Heterogeneous Document Classes

Author: Aiello Marco
Monz Christof
Todoran Leon
Worring Marcel
Publication venue: University of Groningen, Johann Bernoulli Institute for Mathematics and Computer Science
Publication date: 01/01/2001
Field of study

University of Groningen

Thick 2D relations for document understanding

Author: Aiello
Allen
Altamura
Arlazarov
Arnold M.W. Smeulders
Baeza-Yates
Balbiani
Cesarini
Esposito
Goossens
Hersh
Klink
Knuth
Knuth
Knuth
Lee
Marco Aiello
Munro
Nagy
Niyogi
Reynold
Rosenfeld
Toda
Tsujimoto
van Benthem
Warshall
Worring
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Adaptive Methods for Robust Document Image Understanding

Author: Konya Iuliu
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

A vast amount of digital document material is continuously being produced as part of major digitization efforts around the world. In this context, generic and efficient automatic solutions for document image understanding represent a stringent necessity. We propose a generic framework for document image understanding systems, usable for practically any document types available in digital form. Following the introduced workflow, we shift our attention to each of the following processing stages in turn: quality assurance, image enhancement, color reduction and binarization, skew and orientation detection, page segmentation and logical layout analysis. We review the state of the art in each area, identify current defficiencies, point out promising directions and give specific guidelines for future investigation. We address some of the identified issues by means of novel algorithmic solutions putting special focus on generality, computational efficiency and the exploitation of all available sources of information. More specifically, we introduce the following original methods: a fully automatic detection of color reference targets in digitized material, accurate foreground extraction from color historical documents, font enhancement for hot metal typesetted prints, a theoretically optimal solution for the document binarization problem from both computational complexity- and threshold selection point of view, a layout-independent skew and orientation detection, a robust and versatile page segmentation method, a semi-automatic front page detection algorithm and a complete framework for article segmentation in periodical publications. The proposed methods are experimentally evaluated on large datasets consisting of real-life heterogeneous document scans. The obtained results show that a document understanding system combining these modules is able to robustly process a wide variety of documents with good overall accuracy

bonndoc – Der Publikationsserver der Universität Bonn

Combining linguistic and spatial information for document analysis

Author: Aiello M.
Monz C.
Todoran L.
Publication venue
Publication date: 01/01/2000
Field of study

International Migration, Integration and Social Cohesion online publications