2 research outputs found
DescribeX: A Framework for Exploring and Querying XML Web Collections
This thesis introduces DescribeX, a powerful framework that is capable of
describing arbitrarily complex XML summaries of web collections, providing
support for more efficient evaluation of XPath workloads. DescribeX permits the
declarative description of document structure using all axes and language
constructs in XPath, and generalizes many of the XML indexing and summarization
approaches in the literature. DescribeX supports the construction of
heterogeneous summaries where different document elements sharing a common
structure can be declaratively defined and refined by means of path regular
expressions on axes, or axis path regular expression (AxPREs). DescribeX can
significantly help in the understanding of both the structure of complex,
heterogeneous XML collections and the behaviour of XPath queries evaluated on
them.
Experimental results demonstrate the scalability of DescribeX summary
refinements and stabilizations (the key enablers for tailoring summaries) with
multi-gigabyte web collections. A comparative study suggests that using a
DescribeX summary created from a given workload can produce query evaluation
times orders of magnitude better than using existing summaries. DescribeX's
light-weight approach of combining summaries with a file-at-a-time XPath
processor can be a very competitive alternative, in terms of performance, to
conventional fully-fledged XML query engines that provide DB-like functionality
such as security, transaction processing, and native storage.Comment: PhD thesis, University of Toronto, 2008, 163 page