89 research outputs found

    An XML Query Engine for Network-Bound Data

    Get PDF
    XML has become the lingua franca for data exchange and integration across administrative and enterprise boundaries. Nearly all data providers are adding XML import or export capabilities, and standard XML Schemas and DTDs are being promoted for all types of data sharing. The ubiquity of XML has removed one of the major obstacles to integrating data from widely disparate sources –- namely, the heterogeneity of data formats. However, general-purpose integration of data across the wide area also requires a query processor that can query data sources on demand, receive streamed XML data from them, and combine and restructure the data into new XML output -- while providing good performance for both batch-oriented and ad-hoc, interactive queries. This is the goal of the Tukwila data integration system, the first system that focuses on network-bound, dynamic XML data sources. In contrast to previous approaches, which must read, parse, and often store entire XML objects before querying them, Tukwila can return query results even as the data is streaming into the system. Tukwila is built with a new system architecture that extends adaptive query processing and relational-engine techniques into the XML realm, as facilitated by a pair of operators that incrementally evaluate a query’s input path expressions as data is read. In this paper, we describe the Tukwila architecture and its novel aspects, and we experimentally demonstrate that Tukwila provides better overall query performance and faster initial answers than existing systems, and has excellent scalability

    Overview of query optimization in XML database systems

    Get PDF

    XPath: Looking Forward

    Get PDF
    The location path language XPath is of particular importance for XML applications since it is a core component of many XML processing standards such as XSLT or XQuery. In this paper, based on axis symmetry of XPath, equivalences of XPath 1.0 location paths involving reverse axes, such as anc and prec, are established. These equivalences are used as rewriting rules in an algorithm for transforming location paths with reverse axes into equivalent reverse-axis-free ones. Location paths without reverse axes, as generated by the presented rewriting algorithm, enable efficient SAX-like streamed data processing of XPath

    Pathfinder: relational XQuery over multi-gigabyte XML inputs in interactive time

    Get PDF
    Using a relational DBMS as back-end engine for an XQuery processing system leverages relational query optimization and scalable query processing strategies provided by mature DBMS engines in the XML domain. Though a lot of theoretical work has been done in this area and various solutions have been proposed, no complete systems have been made available so far to give the practical evidence that this is a viable approach. In this paper, we describe the ourely relational XQuery processor Pathfinder that has been built on top of the extensible RDBMS MonetDB. Performance results indicate that the system is capable of evaluating XQuery queries efficiently, even if the input XML documents become huge. We additionally present further contributions such as loop-lifted staircase join, techniques to derive order properties and to reduce sorting effort in the generated relational algebra plans, as well as methods for optimizing XQuery joins, which, taken together, enabled us to reach our performance and scalability goal

    XML Reconstruction View Selection in XML Databases: Complexity Analysis and Approximation Scheme

    Full text link
    Query evaluation in an XML database requires reconstructing XML subtrees rooted at nodes found by an XML query. Since XML subtree reconstruction can be expensive, one approach to improve query response time is to use reconstruction views - materialized XML subtrees of an XML document, whose nodes are frequently accessed by XML queries. For this approach to be efficient, the principal requirement is a framework for view selection. In this work, we are the first to formalize and study the problem of XML reconstruction view selection. The input is a tree TT, in which every node ii has a size cic_i and profit pip_i, and the size limitation CC. The target is to find a subset of subtrees rooted at nodes i1,⋯ ,iki_1,\cdots, i_k respectively such that ci1+⋯+cik≤Cc_{i_1}+\cdots +c_{i_k}\le C, and pi1+⋯+pikp_{i_1}+\cdots +p_{i_k} is maximal. Furthermore, there is no overlap between any two subtrees selected in the solution. We prove that this problem is NP-hard and present a fully polynomial-time approximation scheme (FPTAS) as a solution

    The XQueC Project: Compressing and Querying XML

    Get PDF

    The relational XQuery puzzle: a look-back on the pieces found so far

    Get PDF
    Given the tremendous versatility of relational database implementations toward awide range of database problems, it seems only natural to consider them as back-ends for XML data processing. Yet, the assumptions behind the language XQuery are considerably different to those in traditional RDBMSs. The underlying data model is a tree, data and results carry an intrinsic order, queries are described using explicit iteration and, after all, problems are everything else but regular. Solving the relational XQuery puzzle, therefore, has challenged anumber of research groups over the past years. The purpose of this article is to summarize and assess some of the results that have been obtained during this period to solve the puzzle. Our main focus is on the Pathfinder XQuery compiler, afull reference implementation of apurely relational XQuery processor. As we dissect its components, we relate them to other work in the field and also point to open problems and limitations in the context of relational XQuery processin

    Scalable XQuery type matching

    Get PDF
    XML Schema awareness has been an integral part of the XQuery language since its early design stages. Matching XML data against XML types is the main operation that backs up XQuery type expressions, such as typeswitch, instance of, or certain XPath operators. This interaction is particularly vital in data-centric XQuery applications, where data come with detailed type information from an XML Schema document. So far there has been little work on the optimization of those operations. This work presents an efficient implementation of the runtime aspects of XML Schema support. We propose type ranks as a novel and uniform way to implement all facets of type matching in the W3C XQuery Recommendation. As a concise encoding of the type hierarchy defined by an XML Schema document, type ranks minimize the cost of checking the runtime type of XQuery singleton items. By aggregating type ranks, we leverage the grouping capabilities of modern DBMS implementations to efficiently execute type matching on XQuery sequences. In addition, we improve the complexity bounds incurring with typeswitch expressions over existing approaches. Experiments on an off-the-shelf database system demonstrate the potential of our approach
    • …
    corecore