176 research outputs found

    Indexing Temporal XML documents

    Get PDF

    Evaluating XPath Expressions on Light Weight BitCube

    Get PDF
    XML has become a popular way of storing data and hence has also become a new standard for exchanging and representing data on internet. Many techniques have been proposed for indexing and retrieval of XML documents such as XTree, BitCube. In this paper a indexing structure known as Light Weight BitCube is proposed. LWBC is an extension to the earlier BitCube technique which overcomes the memory management problem of BitCube while maintaining the same query processing efficiency as that of BitCube. Many XPath expressions and BitCube operations are evaluated on this LWBC to show the query processing efficiency. The results are also compared with XQEngine, a well known XML Query Processing Engine. The results also show that, Light Weight BitCube manages memory much more efficiently than the BitCube, without compromising on the query processing time

    DescribeX: A Framework for Exploring and Querying XML Web Collections

    Full text link
    This thesis introduces DescribeX, a powerful framework that is capable of describing arbitrarily complex XML summaries of web collections, providing support for more efficient evaluation of XPath workloads. DescribeX permits the declarative description of document structure using all axes and language constructs in XPath, and generalizes many of the XML indexing and summarization approaches in the literature. DescribeX supports the construction of heterogeneous summaries where different document elements sharing a common structure can be declaratively defined and refined by means of path regular expressions on axes, or axis path regular expression (AxPREs). DescribeX can significantly help in the understanding of both the structure of complex, heterogeneous XML collections and the behaviour of XPath queries evaluated on them. Experimental results demonstrate the scalability of DescribeX summary refinements and stabilizations (the key enablers for tailoring summaries) with multi-gigabyte web collections. A comparative study suggests that using a DescribeX summary created from a given workload can produce query evaluation times orders of magnitude better than using existing summaries. DescribeX's light-weight approach of combining summaries with a file-at-a-time XPath processor can be a very competitive alternative, in terms of performance, to conventional fully-fledged XML query engines that provide DB-like functionality such as security, transaction processing, and native storage.Comment: PhD thesis, University of Toronto, 2008, 163 page

    XIST: An XML Index Selection Tool

    Full text link

    VAMANA : A High Performance, Scalable and Cost Driven XPath Engine

    Get PDF
    Many applications are migrating or beginning to make use native XML data. We anticipate that queries will emerge that emphasize the structural semantics of XML query languages like XPath and XQuery. This brings a need for an efficient query engine and database management system tailored for XML data similar to traditional relational engines. While mapping large XML documents into relational database systems while possible, poses difficulty in mapping XML queries to the less powerful relational query language SQL and creates a data model mismatch between relational tables and semi-structured XML data. Hence native solutions to efficiently store and query XML data are being developed recently. However, most of these systems thus far fail to demonstrate scalability with large document sizes, to provide robust support for the XPath query language nor to adequately address costing with respect to query optimization. In this thesis, we propose a novel cost-driven XPath engine to support the scalable evaluation of ad-hoc XPath expressions called VAMANA. VAMANA makes use of an efficient XML repository for storing and indexing large XML documents called the Multi-Axis Storage Structure (MASS) developed at WPI. VAMANA extensively uses indexes for query evaluation by considering index-only plans. To the best of our knowledge, it is the only XML query engine that supports an index plan approach for large XML documents. Our index-oriented query plans allow queries to be evaluated while reading only a fraction of the data, as all tuples for a particular context node are clustered together. The pipelined query framework minimizes the cost of handing intermediate data during query processing. Unlike other native solutions, VAMANA provides support for all 13 XPath axes. Our schema independent cost model provides dynamically calculated statistics that are then used for intelligent cost-based transformations, further improving performance. Our optimization strategy for increasing execution time performance is affirmed through our experimental studies on XMark benchmark data. VAMANA query execution is significantly faster than leading available XML query engines

    botXminer: mining biomedical literature with a new web-based application

    Get PDF
    This paper outlines botXminer, a publicly available application to search XML-formatted MEDLINE(®) data in a complete, object-relational schema implemented in Oracle(®) XML DB. An advantage offered by botXminer is that it can generate quantitative results with certain queries that are not feasible through the Entrez-PubMed(®) interface. After retrieving citations associated with user-supplied search terms, MEDLINE fields (title, abstract, journal, MeSH(®) and chemical) and terms (MeSH qualifiers and descriptors, keywords, author, gene symbol and chemical), these citations are grouped and displayed as tabulated or graphic results. This work represents an extension of previous research for integrating these citations with relational systems. botXminer has a user-friendly, intuitive interface that can be freely accessed at
    • …
    corecore