Analysis and design of approximate queries over XML documents using statistical techniques

Abstract

In the last few years several repositories for storing XML documents and languages for querying XML data have been studied and implemented. All the query languages proposed so far allow to obtain exact answers, but when applied to large XML repositories or warehouses, such precise queries may require high response times. To overcome this problem, in traditional relational warehouses fast approximate queries are supported, built on concise data statistics based on histograms or sampling techniques. We believe that the current trend of XML claims for the extension of such approaches also to query massive XML data-sets. In our work we propose a novel approach to summarize an XML document collection using concise data statistics (e.g., histograms), which allows approximate queries on such data using the XQuery standard language

    Similar works

    Full text

    thumbnail-image