3 research outputs found

    Contribution au développement d\u27un serveur de thèses électroniques

    Get PDF

    Analysis of Document Structures for Element Type Classification

    No full text
    As more and more digital documents become available for the public use from different sources, also the needs of the users increase. Seamless integration of heterogenous collections, e.g., a possibility to query and format documents in a uniform way, is one of these needs. Processing of documents is greatly enhanced if the structure of documents is explicitly represented by some standard (SGML, XML, HTML). Hence, the problem of integrating heterogenous structures has to be taken into consideration. We address this problem by introducing a classification method that acquires knowledge from document instances and their document type definitions, and uses this knowledge to attach a generic class to each SGML element type. The classification retains the tree hierarchy of elements. Although the structure is simplified, enough distinctions remain to facilitate versatile further processing, e.g., formatting. The class of an element type can be stored in the document type definition and, using th..
    corecore