181 research outputs found

    Using Actors to Build a Parallel DBMS

    Get PDF
    In this paper, we present the design and the architecture of a parallel main memory database management system. We focus on concurrency control scheme and recovery. Our prototype is based on the concept of “database actors”, an object-oriented data model well suited for parallelmanipulations. The storage sub system is built upon distributed Ram-files using SDDS (Scalable Distributed Data Structures) techniques. A nested transaction model is proposed and used to handle concurrency access and recovery. We have also proposed novel approach, based on wait-die, to implement a distributed deadlock prevention technique for our model of nested transactions

    The SkipTrie: low-depth concurrent search without rebalancing

    Get PDF
    To date, all concurrent search structures that can support predecessor queries have had depth logarithmic in m, the number of elements. This paper introduces the SkipTrie, a new concurrent search structure supporting predecessor queries in amortized expected O(log log u + c) steps, insertions and deletions in O(c log log u), and using O(m) space, where u is the size of the key space and c is the contention during the recent past. The SkipTrie is a probabilistically-balanced version of a y-fast trie consisting of a very shallow skiplist from which randomly chosen elements are inserted into a hash-table based x-fast trie. By inserting keys into the x-fast-trie probabilistically, we eliminate the need for rebalancing, and can provide a lock-free linearizable implementation. To the best of our knowledge, our proof of the amortized expected performance of the SkipTrie is the first such proof for a tree-based data structure.National Science Foundation (U.S.) (Grant CCF-1217921)United States. Dept. of Energy. Office of Advanced Scientific Computing Research (Grant ER26116/DE-SC0008923)Oracle CorporationIntel Corporatio

    Persistent Data Structures for Incremental Join Indices

    Get PDF
    Join indices are used in relational databases to make join operations faster. Join indices essentially materialise the results of join operations and so accrue maintenance cost, which makes them more suitable for use cases where modifications are rare and joins are performed frequently. To make the maintenance cost lower incrementally updating existing indices is to be preferred. The usage of persistent data structures for the join indices were explored. Motivation for this research was the ability of persistent data structures to construct multiple partially different versions of the same data structure memory efficiently. This is useful, because there can exist different versions of join indices simultaneously due to usage of multi-version concurrency control (MVCC) in a database. The techniques used in Relaxed Radix Balanced Trees (RRB-Trees) persistent data structure were found promising, but none of the popular implementations were found directly suitable for the use case. This exploration was done from the context of a particular proprietary embedded in-memory columnar multidimensional database called FastormDB developed by RELEX Solutions. This focused the research into Java Virtual Machine (JVM) based data structures as the implementation of FastormDB is in Java. Multiple persistent data-structures made for the thesis and ones from Scala, Clojure and Paguro were evaluated with Java Microbenchmark Harness (JMH) and Java Object Layout (JOL) based benchmarks and their results analysed via visualisations

    Observations on Factors Affecting Performance of MapReduce based Apriori on Hadoop Cluster

    Full text link
    Designing fast and scalable algorithm for mining frequent itemsets is always being a most eminent and promising problem of data mining. Apriori is one of the most broadly used and popular algorithm of frequent itemset mining. Designing efficient algorithms on MapReduce framework to process and analyze big datasets is contemporary research nowadays. In this paper, we have focused on the performance of MapReduce based Apriori on homogeneous as well as on heterogeneous Hadoop cluster. We have investigated a number of factors that significantly affects the execution time of MapReduce based Apriori running on homogeneous and heterogeneous Hadoop Cluster. Factors are specific to both algorithmic and non-algorithmic improvements. Considered factors specific to algorithmic improvements are filtered transactions and data structures. Experimental results show that how an appropriate data structure and filtered transactions technique drastically reduce the execution time. The non-algorithmic factors include speculative execution, nodes with poor performance, data locality & distribution of data blocks, and parallelism control with input split size. We have applied strategies against these factors and fine tuned the relevant parameters in our particular application. Experimental results show that if cluster specific parameters are taken care of then there is a significant reduction in execution time. Also we have discussed the issues regarding MapReduce implementation of Apriori which may significantly influence the performance.Comment: 8 pages, 8 figures, International Conference on Computing, Communication and Automation (ICCCA2016
    corecore