51 research outputs found

    From XML schema to relations: a cost-based approach to XML storage

    Get PDF
    Journal ArticleAs Web applications manipulate an increasing amount of XML, there is a growing interest in storing XML data in relational databases. Due to the mismatch between the complexity of XML's tree structure and the simplicity of flat relational tables, there are many ways to store the same document in an RDBMS, and a number of heuristic techniques have been proposed. These techniques typically define fixed mappings and do not take application characteristics into account. However, a fixed mapping is unlikely to work well for all possible applications. In contrast, LegoDB is a cost-based XML storage mapping engine that explores a space of possible XML-to-relational mappings and selects the best mapping for a given application. LegoDB leverages current XML and relational technologies: 1) it models the target application with an XML Schema, XML data statistics, and an XQuery workload; 2) the space of configurations is generated through XML-Schema rewritings; and 3) the best among the derived configurations is selected using cost estimates obtained through a standard relational optimizer. In this paper, we describe the LegoDB storage engine and provide experimental results that demonstrate the effectiveness of this approach

    XQuery Optimization Based on Program Slicing *

    Get PDF
    ABSTRACT XQuery has become the standard query language for XML. The efforts put on this language have produced mature and efficient implementations of XQuery processors. However, in practice the efficiency of XQuery programs is strongly dependent on the ability of the programmer to combine different queries which often affect several XML sources that in turn can be distributed in different branches of the organization. Therefore, techniques to reduce the amount of data loaded and also to reduce the intermediate structures computed by queries is a necessity. In this work we propose a novel technique that allows the programmer to automatically optimize a query in such a way that unnecessary intermediate computations are avoided, and, in addition, it identifies the paths in the source XML documents that are really required to resolve the query

    Programming errors in traversal programs over structured data

    Get PDF
    Traversal strategies \'a la Stratego (also \'a la Strafunski and 'Scrap Your Boilerplate') provide an exceptionally versatile and uniform means of querying and transforming deeply nested and heterogeneously structured data including terms in functional programming and rewriting, objects in OO programming, and XML documents in XML programming. However, the resulting traversal programs are prone to programming errors. We are specifically concerned with errors that go beyond conservative type errors; examples we examine include divergent traversals, prematurely terminated traversals, and traversals with dead code. Based on an inventory of possible programming errors we explore options of static typing and static analysis so that some categories of errors can be avoided. This exploration generates suggestions for improvements to strategy libraries as well as their underlying programming languages. Haskell is used for illustrations and specifications with sufficient explanations to make the presentation comprehensible to the non-specialist. The overall ideas are language-agnostic and they are summarized accordingly

    State-of-the-art on evolution and reactivity

    Get PDF
    This report starts by, in Chapter 1, outlining aspects of querying and updating resources on the Web and on the Semantic Web, including the development of query and update languages to be carried out within the Rewerse project. From this outline, it becomes clear that several existing research areas and topics are of interest for this work in Rewerse. In the remainder of this report we further present state of the art surveys in a selection of such areas and topics. More precisely: in Chapter 2 we give an overview of logics for reasoning about state change and updates; Chapter 3 is devoted to briefly describing existing update languages for the Web, and also for updating logic programs; in Chapter 4 event-condition-action rules, both in the context of active database systems and in the context of semistructured data, are surveyed; in Chapter 5 we give an overview of some relevant rule-based agents frameworks

    Reflective Model Driven Engineering

    Get PDF
    In many large organizations, the model transformations allowing the engineers to more or less automatically go from platformindependent models (PIM) to platform-specific models (PSM) are increasingly seen as vital assets. As tools evolve, it is critical that these transformations are not prisoners of a given CASE tool. Considering in this paper that a CASE tool can be seen as a platform for processing a model transformation, we propose to reflectively apply the MDA to itself. We propos

    State-of-the-art on evolution and reactivity

    Get PDF
    This report starts by, in Chapter 1, outlining aspects of querying and updating resources on the Web and on the Semantic Web, including the development of query and update languages to be carried out within the Rewerse project. From this outline, it becomes clear that several existing research areas and topics are of interest for this work in Rewerse. In the remainder of this report we further present state of the art surveys in a selection of such areas and topics. More precisely: in Chapter 2 we give an overview of logics for reasoning about state change and updates; Chapter 3 is devoted to briefly describing existing update languages for the Web, and also for updating logic programs; in Chapter 4 event-condition-action rules, both in the context of active database systems and in the context of semistructured data, are surveyed; in Chapter 5 we give an overview of some relevant rule-based agents frameworks

    Building Efficient Query Engines using High-Level Languages

    Get PDF
    We are currently witnessing a shift towards the use of high-level programming languages for systems development. These approaches collide with the traditional wisdom which calls for using low-level languages for building efficient software systems. This shift is necessary as billions of dollars are spent annually on the maintenance and debugging of performance-critical software. High-level languages promise faster development of higher-quality software; by offering advanced software features, they help to reduce the number of software errors of the systems and facilitate their verification. Despite these benefits, database systems development seems to be lagging behind as DBMSes are still written in low-level languages. The reason is that the increased productivity offered by high-level languages comes at the cost of a pronounced negative performance impact. In this thesis, we argue that it is now time for a radical rethinking of how database systems are designed. We show that, by using high-level languages, it is indeed possible to build databases that allow for both productivity and high performance. More concretely, in this thesis we follow this abstraction without regret vision and use high-level languages to address the following two problems of database development. First, the introduction of a new storage or memory technology typically requires the development of new versions of most out-of-core algorithms employed by the database system. Given the increasing popularity of hardware specialization, this leads to an arms race for the developers. To make things even worse, there exists no clear methodology for creating such algorithms and we must rely on significant creative effort to serve our need for out-of-core algorithms. To address this issue, we present the OCAS framework for the automatic synthesis of efficient out-of-core algorithms. The developer provides two independent inputs: 1) a memory-hierarchy-oblivious algorithm, expressed using a high-level specification language; and 2) a description of the target memory hierarchy. Using these specifications, our system is then able to automatically synthesize memory-hierarchy and storage-device-aware algorithms for tasks such as joins and sorting. The framework is extensible and quickly synthesizes custom out-of-core algorithms as new storage technologies become available. Second, from a software engineering point of view, years of performance-driven DBMS development have led to complicated, monolithic, low-level code bases, which are hard to maintain and extend. In particular, the introduction of new innovative approaches can be a very time-consuming task. To overcome such limitations, we present LegoBase, a query engine written in the high-level language, Scala. LegoBase realizes the abstraction without regret vision in the domain of analytical query processing. We show how by offering sufficiently powerful abstractions our system allows to easily implement a broad spectrum of optimizations which are difficult to achieve with existing approaches. Then, the key technique to regain efficiency is to apply generative programming and source-to-source compile the entire high-level Scala code to specialized, low-level C code. Our architecture significantly outperforms a commercial in-memory database system and an existing query compiler. LegoBase is the first step towards providing a full DBMS written in a high-level language

    Bidirectional data transformation by calculation

    Get PDF
    MAPi Doctoral Programme in Computer ScienceThe advent of bidirectional programming, in recent years, has led to the development of a vast number of approaches from various computer science disciplines. These are often based on domain-specific languages in which a program can be read both as a forward and a backward transformation that satisfy some desirable consistency properties. Despite the high demand and recognized potential of intrinsically bidirectional languages, they have still not matured to the point of mainstream adoption. This dissertation contemplates some usually disregarded features of bidirectional transformation languages that are vital for deployment at a larger scale. The first concerns efficiency. Most of these languages provide a rich set of primitive combinators that can be composed to build more sophisticated transformations. Although convenient, such compositional languages are plagued by inefficiency and their optimization is mandatory for a serious application. The second relates to configurability. As update translation is inherently ambiguous, users shall be allowed to control the choice of a suitable strategy. The third regards genericity. Writing a bidirectional transformation typically implies describing the concrete steps that convert values in a source schema to values a target schema, making it impractical to express very complex transformations, and practical tools shall support concise and generic coding patterns. We first define a point-free language of bidirectional transformations (called lenses), characterized by a powerful set of algebraic laws. Then, we tailor it to consider additional parameters that describe updates, and use them to refine the behavior of intricate lenses between arbitrary data structures. On top, we propose the Multifocal framework for the evolution of XML schemas. A Multifocal program describes a generic schema-level transformation, and has a value-level semantics defined using the point-free lens language. Its optimization employs the novel algebraic lens calculus.O advento da programação bidirecional, nos últimos anos, fez surgir inúmeras abordagens em diversas disciplinas de ciências da computação, geralmente baseadas em linguagens de domínio específico em que um programa representa uma transformação para a frente ou para trás, satisfazendo certas propriedades de consistência desejáveis. Apesar do elevado potencial de linguagens intrinsicamente bidirecionais, estas ainda não amadureceram o suficiente para serem correntemente utilizadas. Esta dissertação contempla algumas características de linguagens bidirecionais usualmente negligenciadas, mas vitais para um desenvolvimento em mais larga escala. A primeira refere-se à eficiência. A maioria destas linguagens fornece um conjunto rico de combinadores primitivos que podem ser utilizados para construir transformações mais sofisticadas que, embora convenientes, são cronicamente ineficientes, exigindo ser otimizadas para uma aplicação séria. A segunda diz respeito à configurabilidade. Sendo a tradução de modificações inerentemente ambígua, os utilizadores devem poder controlar a escolha de uma estratégia adequada. A terceira prende-se com a genericidade. Escrever uma transformação bidirecional implica tipicamente descrever os passos que convertem um modelo noutro diferente, enquanto que ferramentas práticas devem suportar padrões concisos e genéricos de forma a poderem expressar transformações muito complexas. Primeiro, definimos uma linguagem de transformações bidirecionais (intituladas de lentes), livre de variáveis, caracterizada por um poderoso conjunto de leis algébricas. De seguida, adaptamo-la para receber parâmetros que descrevem modificações, e usamo-los para refinar lentes intrincadas entre estruturas de dados arbitrárias. Por cima, propomos a plataforma Multifocal para a evolução de modelos XML. Um programa Multifocal descreve uma transformação genérica de modelos, cuja semântica ao nível dos valores e consequente otimização é definida em função da linguagem de lentes
    • …
    corecore