1 research outputs found

    Automatic Document Organization Exploiting FOL Similarity-based Techniques

    No full text
    The organization of a document collection into meaningful groups is a fundamental issue in document management systems. The grouping can be carried out by performing a comparison among the layout structure of the documents. Tothisaim,apowerfulrepresentationlanguageabletodescribe the relationsamong all the document componentsis necessary. First-Order Logic formulæ are a powerful representation formalismcharacterizedbytheuseofrelations, that, however, causeseriouscomputationalproblemsdueto the phenomenon of indeterminacy. Furthermore, a mechanism to perform the comparison among the resulting descriptions must be provided. This paper proposes the exploitation of a novel similarity formula and evaluationcriteria for automatically grouping documents in a collection according to their layout structure. This is done by identifying the descriptioncomponentsthataremore similarand hence more likely to correspond to each other, based only on their syntactic structure. Experiments on a real-world dataset prove the effectiveness of the proposal
    corecore