14 research outputs found

    Worst Case Optimal Joins on Relational and XML data

    Get PDF
    In recent data management ecosystem, one of the greatest challenges is the data variety. Data varies in multiple formats such as relational and (semi-)structured data. Traditional database handles a single type of data format and thus its ability to deal with different types of data formats is limited. To overcome such limitation, we propose a multi-model processing framework for relational and semi-structured data (i.e. XML), and design a worst-case optimal join algorithm. The salient feature of our algorithm is that it can guarantee that the intermediate results are no larger than the worst-case join results. Preliminary results show that our multi-model algorithm significantly outperforms the baseline join methods in terms of running time and intermediate result size.Peer reviewe

    TIMBER: A native XML database

    Full text link
    This paper describes the overall design and architecture of the Timber XML database system currently being implemented at the University of Michigan. The system is based upon a bulk algebra for manipulating trees, and natively stores XML. New access methods have been developed to evaluate queries in the XML context, and new cost estimation and query optimization techniques have also been developed. We present performance numbers to support some of our design decisions. We believe that the key intellectual contribution of this system is a comprehensive set-at-a-time query processing ability in a native XML store, with all the standard components of relational query processing, including algebraic rewriting and a cost-based optimizer.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/42328/1/20110274.pd

    XML QUERY EVALUATION

    No full text
    XML is now widely used and management of XML data has become important. To this end, there has been work on the native management of XML data in a database to utilize the different capabilities of such a system like transaction management and indexing structures. At the heart of such a native XML database is the query evaluator, which provides access methods specifically tailored for XML data manipulation. The design of efficient access methods is the topic of this thesis. The most frequently used operation in an XML database is called structural join. Almost all XML queries contain at least one structural join. The structural join returns matches to a pattern from an XML document. We introduce a new efficient family of algorithms to address this task. These algorithms use a stack data structure that exploits the hierarchy of XML in favor of performance. We then develop variants that permit the combination of other operators, including projection, set difference, and universal quantification, with the structural join operation for greater efficiency. An important value provided by XML is the seamless representation of text and structured data. Querying the text with regard to the structure yields fast and accurate results. However, standard database query paradigms are not suitable for querying text. We introduce the TIX algebra for this purpose, and develop new access methods capable of efficiently computing and combining scores associated with intermediate results. In such applications, one is typically interested in only a few results with the highest scores. We develop new access methods to find results that score within a margin of error from the actual top results. These new access methods out-perform getting actual top results by at least an order of magnitude.Ph.D.Applied SciencesComputer scienceUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/124730/2/3163746.pd

    Multi-level operator combination in XML query processing

    Full text link

    Querying Structured Text in an XML Database

    No full text
    XML databases often contain documents comprising structured text. Therefore, it is important to integrate "information retrieval style" query evaluation, which is well-suited for natural language text, with standard "database style" query evaluation, which handles structured queries efficiently. Relevance scoring is central to information retrieval. In the case of XML, this operation becomes more complex because the data required for scoring could reside not directly in an element itself but also in its descendant elements

    Structural Joins: A Primitive for Efficient XML Query Pattern Matching

    No full text
    XML queries typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. The primitive tree structured relationships are parent-child and ancestor-descendant, and finding all occurrences of these relationships in an XML database is a core operation for XML query processing

    The michigan benchmark: Towards XML query performance diagnostics

    No full text
    We propose a micro-benchmark for XML data management to aid engineers in designing improved XML processing engines. This benchmark is inherently different from application-level benchmarks, which are designed to help users choose between alternative products. We primarily attempt to capture the rich variety of data structures and distributions possible in XML, and to isolate their effects, without imitating any particular application. The benchmark specifies a single data set against which carefully specified queries can be used to evaluate system performance for XML data with various characteristics. We have used the benchmark to analyze the performance of three database systems: two native XML DBMSs, and a commercial ORDBMS. The benchmark reveals key strengths and weaknesses of these systems. We find that commercial relational techniques are effective for XML query processing in many cases, but are sensitive to query rewriting, and require better support for efficiently determining indirect structural containment.
    corecore