6 research outputs found
06472 Abstracts Collection - XQuery Implementation Paradigms
From 19.11.2006 to 22.11.2006, the Dagstuhl Seminar 06472 ``XQuery Implementation Paradigms'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available
Towards P2P XML Database Technology
To ease the development of data-intensive P2P applications, we envision a P2P XML Database Management System (P2P XDBMS) that acts as a database middle-ware, providing a uniform database abstraction on top of a dynamic set of distributed data sources. In this PhD work, we research which features such a database abstraction should offer and how it can be realised efficiently by extending and combining existing XML databases with P2P technologies. The first step in this research is a distributed database extension called XRPC. Our planned future work builds upon this, adding P2P abstractions to all main database functionalities (query processing, transactions and data storage)
Distributed XML Query Processing
While centralized query processing over collections of XML data stored at a single site is a well understood problem,
centralized query evaluation techniques are inherently limited in their scalability when presented
with large collections (or a single, large document) and heavy query workloads.
In the context of relational query processing,
similar scalability challenges have been overcome by partitioning data collections,
distributing them across the sites of a distributed system, and then
evaluating queries in a distributed fashion, usually in a way that ensures locality between
(sub-)queries and their relevant data.
This thesis presents a suite of query evaluation techniques for XML data that follow a similar
approach to address the scalability problems encountered by XML query evaluation.
Due to the significant differences in data and query models between relational and XML query
processing, it is not possible to directly apply distributed query evaluation techniques designed
for relational data to the XML scenario.
Instead, new distributed query evaluation
techniques need to be developed.
Thus, in this thesis, an end-to-end solution to the scalability problems encountered by XML query
processing is proposed.
Based on a data partitioning model that supports both horizontal and vertical
fragmentation steps (or any combination of the two), XML collections are fragmented and distributed
across the sites of a distributed system.
Then, a suite of distributed query evaluation strategies is
proposed. These query evaluation techniques ensure locality between each fragment of the collection and
the parts of the query corresponding to the data in this fragment. Special attention is paid to
scalability and query performance, which is achieved by ensuring a high degree of parallelism
during distributed query evaluation and by avoiding access to irrelevant portions of the data.
For maximum flexibility, the suite of distributed query evaluation techniques proposed in this thesis provides
several alternative approaches
for evaluating a given query over a given distributed collection. Thus, to achieve the best performance, it is
necessary to predict and compare the expected performance of each of these alternatives. In this
work, this is accomplished through a query optimization technique based on a
distribution-aware cost model. The same cost model is also used to fine-tune the way a collection is
fragmented to the demands of the query workload evaluated over this collection.
To evaluate the performance impact of the distributed query evaluation techniques proposed in this
thesis, the techniques were implemented within
a production-quality XML database system. Based on this implementation, a
thorough experimental evaluation was performed. The results of this evaluation confirm that the distributed query evaluation
techniques introduced here lead to significant improvements in query performance and scalability
both when compared to centralized techniques and when compared to existing distributed query
evaluation techniques