1,512 research outputs found

    The Lock-free kk-LSM Relaxed Priority Queue

    Full text link
    Priority queues are data structures which store keys in an ordered fashion to allow efficient access to the minimal (maximal) key. Priority queues are essential for many applications, e.g., Dijkstra's single-source shortest path algorithm, branch-and-bound algorithms, and prioritized schedulers. Efficient multiprocessor computing requires implementations of basic data structures that can be used concurrently and scale to large numbers of threads and cores. Lock-free data structures promise superior scalability by avoiding blocking synchronization primitives, but the \emph{delete-min} operation is an inherent scalability bottleneck in concurrent priority queues. Recent work has focused on alleviating this obstacle either by batching operations, or by relaxing the requirements to the \emph{delete-min} operation. We present a new, lock-free priority queue that relaxes the \emph{delete-min} operation so that it is allowed to delete \emph{any} of the ρ+1\rho+1 smallest keys, where ρ\rho is a runtime configurable parameter. Additionally, the behavior is identical to a non-relaxed priority queue for items added and removed by the same thread. The priority queue is built from a logarithmic number of sorted arrays in a way similar to log-structured merge-trees. We experimentally compare our priority queue to recent state-of-the-art lock-free priority queues, both with relaxed and non-relaxed semantics, showing high performance and good scalability of our approach.Comment: Short version as ACM PPoPP'15 poste

    Practical Target-Based Synchronization Strategies for Immutable Time-Series Data Tables

    Get PDF
    As the Internet of Things and industrial monitoring of utilities grow, efficiently synchronizing immutable time-series data streams between databases becomes a pressing issue. Extracting data from critical production databases demands careful consideration of the stress imposed on the machines, so synchronization strategies are required to minimize the transfer of duplicate data and the load imposed on remote sources. Literature on the synchronization problem is generalized to arbitrary tables and does not consider the characteristics of time-series data streams, so research was required to investigate methods to quickly synchronize source and target time-series data tables. This thesis examines immutable time-series scenarios and synchronization strategies to answer the following question: given several scenarios, which target-based immutable time-series synchronization strategies best optimize run-time, bandwidth, and accuracy? The strategies explored in this research are implemented into the Meerschaum system, a project intended to leverage these time-series concepts for production deployments. As a practical demonstration, these strategies are used to continuously cache Clemson University’s utilities data

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

    Key-value storage system synchronization in peer-to-peer environments

    Get PDF
    Data synchronization is the problem of bringing multiple versions of the same data on different remote devices to the most up to date version. This thesis looks into the particular problem of key-value storage systems synchronization between mobile devices in a peer-to-peer environment. In this research, we describe, implement and evaluate a new key-value storage system synchronization algorithm using a 2-phase approach, combining approximate synchronization in the first phase and exact synchronization in the second phase. The 2-phase architecture helps the algorithm achieve considerable boost in performance in all three major criteria of a data synchronization algorithm, namely synchronization time, processing time and communication cost, while still being suitable to operate in a peer-to-peer environment. The performance increase makes it feasible to employ database synchronization technique in a wider range of mobile applications, especially those operating on a slow peer-to-peer network
    corecore