17 research outputs found

    FACHBEITRAG Unleashing XQuery for Data-Independent Programming

    Get PDF
    an SQL equivalent for XML data, but its roots in functional programming make it also a perfect choice for processing almost any kind of structured and semi-structured data. Apart from standard XML processing, however, advanced language features make it hard to efficiently implement the complete language for large data volumes. This work proposes a novel compilation strategy that provides both flexibility and efficiency to unleash XQuery’s potential as data programming language. It combines the simplicity and versatility of a storage-independent data abstraction with the scalability advantages of set-oriented processing. Expensive iterative sections in a query are unrolled to a pipeline of relational-style operators, which is open for optimized join processing, index use, and parallelization. The remaining aspects of the language are processed in a standard fashion, yet can be compiled anytime to more efficient native operations of the actual runtime environment. This hybrid compilation mechanism yields an efficient and highly flexible query engine that is able to drive any computation from simple XML transformation to complex data analysis, even on non-XML data. Experiments with our prototype and stateof-the-art competitors in classic XML query processing and business analytics over relational data attest the generality and efficiency of the design

    Efficient Semi-structured Queries in Scala using XQuery Shipping

    Get PDF
    This project proposes a new approach to interact with database systems through programming languages. A formal query language can be integrated within modern programming languages and the semi-structured queries can be evaluated using automatic transformation and query shipping. The focus of this project is on XML queries and Scala programming language. Particularly, this project optimizes the XML-based expressions of Scala using XQuery transformation and Shipping. In this work, Scala sequence comprehensions are extended to cover appropriately the whole functionalities of XQuery FLWOR expressions and XQuery sequence comparisons are introduced in Scala to facilitate query generation. This report presents a formalization of transformation rules between Scala and XQuery languages and describes an Scala implementation. Various use cases are provided to facilitate understanding and employing this newest Scala library

    Updating XML Views

    Get PDF
    Update operations over XML views are essential for applications using XML views. In this dissertation work, we provide scalable solutions to support updating through XML views defined over relational databases. Especially we focus on the update-public semantic, where updates are always public (made to the public database), and the update-local semantic, where update effects are first kept local and then made public as and when required. Towards this, we propose the clean extended-source theory for determining whether a correct view update translation exists, which then serves as a theoretical foundation for us to design practical XML view updating algorithms. Under update-public semantic, state-of-the-art view updating work focus on identifying the correct update translation purely on the data. We instead take a schema-centric solution, which utilizes the schema of the underlying source to effectively prune updates that are guaranteed to be not translatable and pass updates that are guaranteed to be translatable directly to the SQL engine. Only those updates that could not be classified using schema knowledge are finally analyzed by examining the data. This required data-level check is further optimized under schema guidance to prune the search space for finding a correct translation. As the first work addressing the update-local semantic, we propose a practical framework, called LoGo. LoGo Localizes the view update translation, while preserves the properties of views being side-effect free and updates being always updatable. LoGo also supports on-demand merging of the local database of the subject viewinto the public database (also called global database), while still guaranteeing the subject view being free of side effects. A flexible synchronization service is provided in LoGo that enables all other views defined over the same public database to be refreshed, i.e., synchronized with the publically committed changes, if so desired. Further, given that XMLis an ordered datamodel,we propose an ordersensitive solution named O-HUX to support XML view updating with order. We have implemented the algorithms, along with respective optimization techniques. Experimental results confirm the effectiveness of the proposed services, and highlight its performance characteristics

    Incremental maintenance of materialized xquery views

    Get PDF
    Keeping views fresh by maintaining the consistency between materialized views and their base data in the presence of base updates is a critical prob-lem for many applications, including data warehousing and data integra-tion. While heavily studied for traditional databases, the maintenance of XML views remains largely unexplored. Maintaining XML views is com-plex due to the richness of the XML data model and the powerful capabili-ties of XML query languages, such as XQuery. This dissertation proposes a comprehensive solution for the general problem of maintaining materialized XQuery views. Our solution is the first to enable the maintenance of a large class of XQuery views including XPath expressions, FLWOR expressions, and Element Constructors. These views may contain arbitrary result construction and arbitrary grouping and join operations. Our solution also supports the unique order requirements of XQuery including source document order and query order. Th

    TIMBER: A native XML database

    Full text link
    This paper describes the overall design and architecture of the Timber XML database system currently being implemented at the University of Michigan. The system is based upon a bulk algebra for manipulating trees, and natively stores XML. New access methods have been developed to evaluate queries in the XML context, and new cost estimation and query optimization techniques have also been developed. We present performance numbers to support some of our design decisions. We believe that the key intellectual contribution of this system is a comprehensive set-at-a-time query processing ability in a native XML store, with all the standard components of relational query processing, including algebraic rewriting and a cost-based optimizer.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/42328/1/20110274.pd

    Links: Web Programming Without Tiers

    Get PDF
    Abstract. Links is a programming language for web applications that generates code for all three tiers of a web application from a single source, compiling into JavaScript to run on the client and into SQL to run on the database. Links supports rich clients running in what has been dubbed ‘Ajax ’ style, and supports concurrent processes with statically-typed message passing. Links is scalable in the sense that session state is preserved in the client rather than the server, in contrast to other approaches such as Java Servlets or PLT Scheme. Client-side concurrency in JavaScript and transfer of computation between client and server are both supported by translation into continuation-passing style.

    Compile-Time Query Optimization for Big Data Analytics

    Get PDF
    Many emerging programming environments for large-scale data analysis, such as Map-Reduce, Spark, and Flink, provide Scala-based APIs that consist of powerful higher-order operations that ease the development of complex data analysis applications. However, despite the simplicity of these APIs, many programmers prefer to use declarative languages, such as Hive and Spark SQL, to code their distributed applications. Unfortunately, most current data analysis query languages are based on the relational model and cannot effectively capture the rich data types and computations required for complex data analysis applications. Furthermore, these query languages are not well-integrated with the host programming language, as they are based on an incompatible data model. To address these shortcomings, we introduce a new query language for data-intensive scalable computing that is deeply embedded in Scala, called DIQL, and a query optimization framework that optimizes and translates DIQL queries to byte code at compile-time. In contrast to other query languages, our query embedding eliminates impedance mismatch as any Scala code can be seamlessly mixed with SQL-like syntax, without having to add any special declaration. DIQL supports nested collections and hierarchical data and allows query nesting at any place in a query. With DIQL, programmers can express complex data analysis tasks, such as PageRank and matrix factorization, using SQL-like syntax exclusively. The DIQL query optimizer uses algebraic transformations to derive all possible joins in a query, including those hidden across deeply nested queries, thus unnesting nested queries of any form and any number of nesting levels. The optimizer also uses general transformations to push down predicates before joins and to prune unneeded data across operations. DIQL has been implemented on three Big Data platforms, Apache Spark, Apache Flink, and Twitter's Cascading/Scalding, and has been shown to have competitive performance relative to Spark DataFrames and Spark SQL for some complex queries. This paper extends our previous work on embedded data-intensive query languages by describing the complete details of the formal framework and the query translation and optimization processes, and by providing more experimental results that give further evidence of the performance of our system

    Automaton Meet Algebra: A Hybrid Paradigm for Efficiently Processing XQuery over XML Stream

    Get PDF
    XML stream applications bring the challenge of efficiently processing queries on sequentially accessible token-based data streams. The automaton paradigm is naturally suited for pattern retrieval on tokenized XML streams, but requires patches for implementing the filtering or restructuring functionalities common for the XML query languages. In contrast, the algebraic paradigm is well-established for processing self-contained tuples. However, it does not traditionally support token inputs. This dissertation proposes a framework called Raindrop, which accommodates both the automaton and algebra paradigms to take advantage of both. First, we propose an architecture for Raindrop. Raindrop is an algebra framework that models queries at different abstraction levels. We represent the token-based automaton computations as an algebraic subplan at the high level while exposing the automaton details at the low level. The algebraic subplan modeling automaton computations can thus be integrated with the algebraic subplan modeling the non-automaton computations. Second, we explore a novel optimization opportunity. Other XML stream processing systems always retrieve all the patterns in a query in the automaton. In contrast, Raindrop allows a plan to retrieve some of the pattern retrieval in the automaton and some out of the automaton. This opens up an automaton-in-or-out optimization opportunity. We study this optimization in two types of run-time environments, one with stable data characteristics and one with fluctuating data characteristics. We provide search strategies catering to each environment. We also describe how to migrate from a currently running plan to a new plan at run-time. Third, we optimize the automaton computations using the schema knowledge. A set of criteria are established to decide what schema constraints are useful to a given query. Optimization rules utilizing different types of schema constraints are proposed based on the criteria. We design a rule application algorithm which ensures both completeness (i.e., no optimization is missed) and minimality (i.e., no redundant optimization is introduced). The experimentations on both real and synthetic data illustrate that these techniques bring significant performance improvement with little overhead

    Methods for Semantic Interoperability in AutomationML-based Engineering

    Get PDF
    Industrial engineering is an interdisciplinary activity that involves human experts from various technical backgrounds working with different engineering tools. In the era of digitization, the engineering process generates a vast amount of data. To store and exchange such data, dedicated international standards are developed, including the XML-based data format AutomationML (AML). While AML provides a harmonized syntax among engineering tools, the semantics of engineering data remains highly heterogeneous. More specifically, the AML models of the same domain or entity can vary dramatically among different tools that give rise to the so-called semantic interoperability problem. In practice, manual implementation is often required for the correct data interpretation, which is usually limited in reusability. Efforts have been made for tackling the semantic interoperability problem. One mainstream research direction has been focused on the semantic lifting of engineering data using Semantic Web technologies. However, current results in this field lack the study of building complex domain knowledge that requires a profound understanding of the domain and sufficient skills in ontology building. This thesis contributes to this research field in two aspects. First, machine learning algorithms are developed for deriving complex ontological concepts from engineering data. The induced concepts encode the relations between primitive ones and bridge the semantic gap between engineering tools. Second, to involve domain experts more tightly into the process of ontology building, this thesis proposes the AML concept model (ACM) for representing ontological concepts in a native AML syntax, i.e., providing an AML-frontend for the formal ontological semantics. ACM supports the bidirectional information flow between the user and the learner, based on which the interactive machine learning framework AMLLEARNER is developed. Another rapidly growing research field devotes to develop methods and systems for facilitating data access and exchange based on database theories and techniques. In particular, the so-called Query By Example (QBE) allows the user to construct queries using data examples. This thesis adopts the idea of QBE in AML-based engineering by introducing the AML Query Template (AQT). The design of AQT has been focused on a native AML syntax, which allows constructing queries with conventional AML tools. This thesis studies the theoretical foundation of AQT and presents algorithms for the automated generation of query programs. Comprehensive requirement analysis shows that the proposed approach can solve the problem of semantic interoperability in AutomationML-based engineering to a great extent
    corecore