5,610 research outputs found

    Relational Approach to Logical Query Optimization of XPath

    Get PDF
    To be able to handle the ever growing volumes of XML documents, effective and efficient data management solutions are needed. Managing XML data in a relational DBMS has great potential. Recently, effective relational storage schemes and index structures have been proposed as well as special-purpose join operators to speed up querying of XML data using XPath/XQuery. In this paper, we address the topic of query plan construction and logical query optimization. The claim of this paper is that standard relational algebra extended with special-purpose join operators suffices for logical query optimization. We focus on the XPath accelerator storage scheme and associated staircase join operators, but the approach can be generalized easily

    MonetDB/XQuery: a fast XQuery processor powered by a relational engine

    Get PDF
    Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational tables, (ii) a compilation technique that translates XQuery into a basic relational algebra, (iii) a restricted (order) property-aware peephole relational query optimization strategy, and (iv) a mapping from XML update statements into relational updates. Thus, this system implements all essential XML database functionalities (rather than a single feature) such that we can learn from the full consequences of our architectural decisions. While implementing this system, we had to extend the state-of-the-art with a number of new technical contributions, such as loop-lifted staircase join and efficient relational query evaluation strategies for XQuery theta-joins with existential semantics. These contributions as well as the architectural lessons learned are also deemed valuable for other relational back-end engines. The performance and scalability of the resulting system is evaluated on the XMark benchmark up to data sizes of 11GB. The performance section also provides an extensive benchmark comparison of all major XMark results published previously, which confirm that the goal of purely relational XQuery processing, namely speed and scalability, was met

    Pattern tree-based XOLAP rollup operator for XML complex hierarchies

    Full text link
    With the rise of XML as a standard for representing business data, XML data warehousing appears as a suitable solution for decision-support applications. In this context, it is necessary to allow OLAP analyses on XML data cubes. Thus, XQuery extensions are needed. To define a formal framework and allow much-needed performance optimizations on analytical queries expressed in XQuery, defining an algebra is desirable. However, XML-OLAP (XOLAP) algebras from the literature still largely rely on the relational model. Hence, we propose in this paper a rollup operator based on a pattern tree in order to handle multidimensional XML data expressed within complex hierarchies

    Utilizing Structural Knowledge for Information Retrieval in XML Databases

    Get PDF
    In this paper we address the problem of immediate translation of eXtensible Mark-up Language (XML) information retrieval (IR) queries to relational database expressions and stress the benefits of using an intermediate XML-specific algebra over relational algebra. We show how adding an XML-specific algebra at the logical level of a DBMS enables a level of abstraction from both query languages for information retrieval in XML and the underlying physical storage and manipulation. We picked a region algebra as a basis for defining the structure aware (SA) view on XML in which we can distinguish among different XML entities, such as element nodes, text nodes, words, and determine their containment relation. Region algebras are already well established in semi-structured document processing as shown in an extensive overview of region algebra approaches in this paper. Furthermore, we propose a variant of region algebra that can support ranking operators in an elegant way while staying algebraic. As relevance scores are computed for regions in our region algebra we named it score region algebra (SRA). The benefits of introducing score region algebra are explained on a set of query examples. Besides abstracting from the query language used and the physical implementation, SRA enables a certain degree of abstraction from the retrieval model used and the opportunity to use the query optimization at the logical level of a database. Various retrieval models can be instantiated at the physical level based on the abstract specification of SRA operators. We also discuss numerous region algebra operator properties that provide a firm ground for query rewriting and optimization at the SA level, which is an important premise for the existence of such a logical view on XML

    Generic model for application driven XML data processing

    Get PDF
    Abstract XML technology has emerged during recent years as a popular choice for representing and exchanging semi-structured data on the Web. It integrates seamlessly with web- based applications. If data is stored and represented as XML documents, then it should be possible to query the contents of these documents in order to extract, synthesize and analyze their contents. This thesis for experimental study of Web architecture for data processing is based on semantic mapping of XML Schema. The thesis involves complex methods and tools for specification, algorithmic transformation and online processing of semi- structured data over the Web in XML format with persistent storage into relational databases. The main focus of the research is preserving the structure of original data for data reconciliation during database updates and also to combine different technologies for XML data processing such as storing (SQL), transformation (XSL Processors), presenting (HTML), querying (XQUERY) and transporting (Web services) using a common framework, which is both theoretically and technologically well grounded. The experimental implementation of the discussed architecture requires a Web server (Apache), Java container (Tomcat) and object-relational DBMS (Oracle 9) equipped with Java engine and corresponding libraries for parsing and transformation of XML data (Xerces and Xalan). Furthermore the central idea behind the research is to use a single theoretical model of the data to be processed by the system (XML algebra) controlled by one standard metalanguage specification (XML Schema) for solving a class of problems (generic architecture). The proposed work combines theoretical novelty and technological advancement in the field of Internet computing. This thesis will introduce a generic approach since both our model (XML algebra) and our problem solver (the architecture of the integrated system) are XML Schema- driven. Starting with the XML Schema of the data, we first develop domain-specific XML algebra suitable for data processing of the specific data and then use it for implementing the main offline components of the system for data processing

    XML Vectorization: A Column-Based XML Storage Model

    Get PDF
    The usual method for storing tables in a relational database is to store each tuple contiguously in secondary storage. A simple alternative is to store the columns contiguously, so that a table is represented as a set of vectors all of the same length. It has been shown that such a representation performs well on queries requiring few columns. This paper reviews the shredding scheme used in XMill, an XML compressor, which represents the document structure by using a set of files, consisting of a file describing the structure, and files describing the character data to be found on designated paths (corresponding to the column data). We consider such a shredding as a storage model –- XML vectorization –- by presenting an indexing scheme and a physical algebra associated with a detailed cost model. We study query processing on the XML vectorization, in particular the XML join queries. XML join queries are often translated into a few relational join operations in the relational-based XML storage systems. The use of columns enables us to develop a fast join algorithm for vectorized XML based on two hashbased join algorithms. The important feature of the join algorithm is that the disk access of the algorithm is mostly sequential and the data not needed are not read from disk. Experimental results demonstrate the effectiveness of the join algorithm for vectorized XML

    On Region Algebras, XML Databases, and Information Retrieval

    Get PDF
    This paper describes some new ideas on developing a logical algebra for databases that manage textual data and support information retrieval functionality. We describe a first prototype of such a system

    Pathfinder: XQuery - The Relational Way

    Get PDF
    Relational query processors are probably the best understood (as well as the best engineered) query engines available today. Although carefully tuned to process instances of the relational model (tables of tuples), these processors can also provide a foundation for the evaluation of "alien" (non-relational) query languages: if a relational encoding of the alien data model and its associated query language is given, the RDBMS may act like a special-purpose processor for the new language
    corecore