5,188 research outputs found

    Web Queries: From a Web of Data to a Semantic Web?

    Get PDF

    Global Numerical Constraints on Trees

    Full text link
    We introduce a logical foundation to reason on tree structures with constraints on the number of node occurrences. Related formalisms are limited to express occurrence constraints on particular tree regions, as for instance the children of a given node. By contrast, the logic introduced in the present work can concisely express numerical bounds on any region, descendants or ancestors for instance. We prove that the logic is decidable in single exponential time even if the numerical constraints are in binary form. We also illustrate the usage of the logic in the description of numerical constraints on multi-directional path queries on XML documents. Furthermore, numerical restrictions on regular languages (XML schemas) can also be concisely described by the logic. This implies a characterization of decidable counting extensions of XPath queries and XML schemas. Moreover, as the logic is closed under negation, it can thus be used as an optimal reasoning framework for testing emptiness, containment and equivalence

    Content-Aware DataGuides for Indexing Large Collections of XML Documents

    Get PDF
    XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents
    • …
    corecore