919 research outputs found

    Storing XML Documents in Databases

    Get PDF
    The authors introduce concepts for loading large amounts of XML documents into databases where the documents are stored and maintained. The goal is to make XML databases as unobtrusive in multi-tier systems as possible and at the same time provide as many services defined by the XML standards as possible. The ubiquity of XML has sparked great interest in deploying concepts known from Relational Database Management Systems such as declarative query languages, transactions, indexes and integrity constraints. This chapter presents now bulkloading is done in Monet XML, a main memory XML database system, and evaluates the cost of bulkloading and bulk deletion with respect to strategies which base on insertion and deletion of individual nodes. Additionally, we survey the applicability of the techniques to a wider class of XML storage schemas

    Automatic Verification of Transactions on an Object-Oriented Database

    Get PDF
    In the context of the object-oriented data model, a compiletime approach is given that provides for a significant reduction of the amount of run-time transaction overhead due to integrity constraint checking. The higher-order logic Isabelle theorem prover is used to automatically prove which constraints might, or might not be violated by a given transaction in a manner analogous to the one used by Sheard and Stemple (1989) for the relational data model. A prototype transaction verification tool has been implemented, which automates the semantic mappings and generates proof goals for Isabelle. Test results are discussed to illustrate the effectiveness of our approach

    Reify Your Collection Queries for Modularity and Speed!

    Full text link
    Modularity and efficiency are often contradicting requirements, such that programers have to trade one for the other. We analyze this dilemma in the context of programs operating on collections. Performance-critical code using collections need often to be hand-optimized, leading to non-modular, brittle, and redundant code. In principle, this dilemma could be avoided by automatic collection-specific optimizations, such as fusion of collection traversals, usage of indexing, or reordering of filters. Unfortunately, it is not obvious how to encode such optimizations in terms of ordinary collection APIs, because the program operating on the collections is not reified and hence cannot be analyzed. We propose SQuOpt, the Scala Query Optimizer--a deep embedding of the Scala collections API that allows such analyses and optimizations to be defined and executed within Scala, without relying on external tools or compiler extensions. SQuOpt provides the same "look and feel" (syntax and static typing guarantees) as the standard collections API. We evaluate SQuOpt by re-implementing several code analyses of the Findbugs tool using SQuOpt, show average speedups of 12x with a maximum of 12800x and hence demonstrate that SQuOpt can reconcile modularity and efficiency in real-world applications.Comment: 20 page

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
    corecore