5,188 research outputs found
Global Numerical Constraints on Trees
We introduce a logical foundation to reason on tree structures with
constraints on the number of node occurrences. Related formalisms are limited
to express occurrence constraints on particular tree regions, as for instance
the children of a given node. By contrast, the logic introduced in the present
work can concisely express numerical bounds on any region, descendants or
ancestors for instance. We prove that the logic is decidable in single
exponential time even if the numerical constraints are in binary form. We also
illustrate the usage of the logic in the description of numerical constraints
on multi-directional path queries on XML documents. Furthermore, numerical
restrictions on regular languages (XML schemas) can also be concisely described
by the logic. This implies a characterization of decidable counting extensions
of XPath queries and XML schemas. Moreover, as the logic is closed under
negation, it can thus be used as an optimal reasoning framework for testing
emptiness, containment and equivalence
Content-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with
textual content. However, most indexing approaches perform
structure and content matching independently, combining
the retrieved path and keyword occurrences in a third
step. This paper shows that retrieval in XML documents can
be accelerated significantly by processing text and structure
simultaneously during all retrieval phases. To this end,
the Content-Aware DataGuide (CADG) enhances the wellknown
DataGuide with (1) simultaneous keyword and path
matching and (2) a precomputed content/structure join. Extensive
experiments prove the CADG to be 50-90% faster
than the DataGuide for various sorts of query and document,
including difficult cases such as poorly structured
queries and recursive document paths. A new query classification
scheme identifies precise query characteristics with
a predominant influence on the performance of the individual
indices. The experiments show that the CADG is applicable
to many real-world applications, in particular large
collections of heterogeneously structured XML documents
- …