3 research outputs found

    Thick 2D Relations for Document Understanding

    Get PDF
    We use a propositional language of qualitative rectangle relations to detect the reading order from document images. To this end, we define the notion of a document encoding rule and we analyze possible formalisms to express document encoding rules such as LATEX and SGML. Document encoding rules expressed in the propositional language of rectangles are used to build a reading order detector for document images. In order to achieve robustness and avoid brittleness when applying the system to real life document images, the notion of a thick boundary interpretation for a qualitative relation is introduced. The framework is tested on a collection of heterogeneous document images showing recall rates up to 89%

    A two level knowledge approach for understanding documents of a multi-class domain

    No full text
    4noIn this paper an architecture for understanding documents of a domain that can be grouped into classes is shown. Documents are grouped with respect to the physical structure. The architecture is based on two knowledge descriptions of the domain: one is independent from the classes and one related to the classes. Such knowledge levels are used to understand the documents of the domain. The understanding phase is described in relation with the phases of analysis and classification of such documents. © 1999 IEEE.nonenoneCesarini F.; Francesconi E.; Gori M.; Soda G.Cesarini, F.; Francesconi, E.; Gori, Marco; Soda, G
    corecore