8 research outputs found

    Compresi贸n BZIP2 optimizada usando colas libres de bloqueo

    Get PDF
    Since the general trend nowadays is to have more and more processors (cores) available in each computer, the scalability of the data structures used in parallel programs must be carefully considered in order to guarantee that they take full advantage of the available processors. Because of increased containment, lock-based data structures usually do not perform proportionally as the number of processors increases. The use of well-designed lock-free data structures, like first in-first out, (fifo) queues, can boost the performance of a parallel program when many processors are available. In this work, a parallel version of bzip2, a popular compression and decompression program, is designed and implemented by using lock-free queues instead of the lock-based ones, and applying a two-buffer-output strategy. The performance of lock-free implementation is measured against lock-based implementations. Compression time was measured with different number of processors and different block sizes. Consistent with our hypothesis, the results show that our parallel lock-free implementation outperforms the other implementations.Debido a que la tendencia actual es tener m谩s y m谩s procesadores (cores) disponibles en cada computadora, la escalabilidad de las estructuras de datos usadas en programaci贸n paralela debe ser considerada cuidadosamente, para as铆 garantizar que ellas saquen ventaja de los procesadores disponibles. Debido al aumento en la contenci贸n, usualmente las estructuras de datos basadas en bloqueos no mejoran su rendimiento proporcionalmente al incrementar el n煤mero de procesadores. El uso de estructuras de datos libres de bloqueos bien dise帽adas, tales como las colas first in-first out, puede mejorar el rendimiento de un programa paralelo, cuando hay varios procesadores disponibles. En este trabajo se dise帽a e implementa una versi贸n paralela de bzip2, un programa para compresi贸n y descompresi贸n de datos muy popular, usando colas libres de bloqueos en lugar de las basadas en bloqueos, y aplicando una estrategia de dos buffers de salida. Se compara el rendimiento de la implementaci贸n libre de bloqueos contra implementaciones basadas en bloqueos.聽 Se midi贸 el tiempo de compresi贸n usando diferente n煤mero de procesadores y diferentes tama帽os de bloques.聽 Coincidiendo con la hip贸tesis de trabajo, los resultados muestran que la implementaci贸n paralela libre de bloqueos supera las otras implementaciones

    Practical Dynamic Transactional Data Structures

    Get PDF
    Multicore programming presents the challenge of synchronizing multiple threads. Traditionally, mutual exclusion locks are used to limit access to a shared resource to a single thread at a time. Whether this lock is applied to an entire data structure, or only a single element, the pitfalls of lock-based programming persist. Deadlock, livelock, starvation, and priority inversion are some of the hazards of lock-based programming that can be avoided by using non-blocking techniques. Non-blocking data structures allow scalable and thread-safe access to shared data by guaranteeing, at least, system-wide progress. In this work, we present the first wait-free hash map which allows a large number of threads to concurrently insert, get, and remove information. Wait-freedom means that all threads make progress in a finite amount of time --- an attribute that can be critical in real-time environments. We only use atomic operations that are provided by the hardware; therefore, our hash map can be utilized by a variety of data-intensive applications including those within the domains of embedded systems and supercomputers. The challenges of providing this guarantee make the design and implementation of wait-free objects difficult. As such, there are few wait-free data structures described in the literature; in particular, there are no wait-free hash maps. It often becomes necessary to sacrifice performance in order to achieve wait-freedom. However, our experimental evaluation shows that our hash map design is, on average, 7 times faster than a traditional blocking design. Our solution outperforms the best available alternative non-blocking designs in a large majority of cases, typically by a factor of 15 or higher. The main drawback of non-blocking data structures is that only one linearizable operation can be executed by each thread, at any one time. To overcome this limitation we present a framework for developing dynamic transactional data containers. Transactional containers are those that execute a sequence of operations atomically and in such a way that concurrent transactions appear to take effect in some sequential order. We take an existing algorithm that transforms non-blocking sets into static transactional versions (LFTT), and we modify it to support maps. We implement a non-blocking transactional hash map using this new approach. We continue to build on LFTT by implementing a lock-free vector using a methodology to allow LFTT to be compatible with non-linked data structures. A static transaction requires all operands and operations to be specified at compile-time, and no code may be executed between transactions. These limitations render static transactions impractical for most use cases. We modify LFTT to support dynamic transactions, and we enhance it with additional features. Dynamic transactions allow operands to be specified at runtime rather than compile-time, and threads can execute code between the data structure operations of a transaction. We build a framework for transforming non-blocking containers into dynamic transactional data structures, called Dynamic Transactional Transformation (DTT), and provide a library of novel transactional containers. Our library provides the wait-free progress guarantee and supports transactions among multiple data structures, whereas previous work on data structure transactions has been limited to operating on a single container. Our approach is 3 times faster than software transactional memory, and its performance matches its lock-free transactional counterpart

    A Lock-Free Priority Queue Design Based On Multi-Dimensional Linked Lists

    No full text
    The throughput of concurrent priority queues is pivotal to multiprocessor applications such as discrete event simulation, best-first search and task scheduling. Existing lock-free priority queues are mostly based on skiplists, which probabilistically create shortcuts in an ordered list for fast insertion of elements. The use of skiplists eliminates the need of global rebalancing in balanced search trees and ensures logarithmic sequential search time on average, but the worst-case performance is linear with respect to the input size. In this paper, we propose a quiescently consistent lock-free priority queue based on a multi-dimensional list that guarantees worst-case search time of O(\log N) for key universe of size N. The novel multi-dimensional list (MDList) is composed of nodes that contain multiple links to child nodes arranged by their dimensionality. The insertion operation works by first injectively mapping the scalar key to a high-dimensional vector, then uniquely locating the target position by using the vector as coordinates. Nodes in MDList are ordered by their coordinate prefixes and the ordering property of the data structure is readily maintained during insertion without rebalancing nor randomization. In our experimental evaluation using a micro-benchmark, our priority queue achieves an average of 50 percent speedup over the state of the art approaches under high concurrency

    A Lock-Free Priority Queue Design Based on Multi-Dimensional Linked Lists

    No full text
    The throughput of concurrent priority queues is pivotal to multiprocessor applications such as discrete event simulation, best-first search and task scheduling. Existing lock-free priority queues are mostly based on skiplists, which probabilistically create shortcuts in an ordered list for fast insertion of elements. The use of skiplists eliminates the need of global rebalancing in balanced search trees and ensures logarithmic sequential search time on average, but the worst-case performance is linear with respect to the input size. In this paper, we propose a quiescently consistent lock-free priority queue based on a multi-dimensional list that guarantees worst-case search time of O(\log N) for key universe of size N. The novel multi-dimensional list (MDList) is composed of nodes that contain multiple links to child nodes arranged by their dimensionality. The insertion operation works by first injectively mapping the scalar key to a high-dimensional vector, then uniquely locating the target position by using the vector as coordinates. Nodes in MDList are ordered by their coordinate prefixes and the ordering property of the data structure is readily maintained during insertion without rebalancing nor randomization. In our experimental evaluation using a micro-benchmark, our priority queue achieves an average of 50 percent speedup over the state of the art approaches under high concurrency

    A Comparison of Concurrent Correctness Criteria for Shared Memory Based Data Structure

    Get PDF
    Developing concurrent algorithms requires safety and liveness to be defined in order to understand their proper behavior. Safety refers to the correctness criteria while liveness is the progress guarantee. Nowadays there are a variety of correctness conditions for concurrent objects. The way these correctness conditions differ and the various trade-offs they present with respect to performance, usability, and progress guarantees is poorly understood. This presents a daunting task for the developers and users of such concurrent algorithms who are trying to better understand the correctness of their code and the various trade-offs associated with their design choices and use. The purpose of this study is to explore the set of known correctness conditions for concurrent objects, find their correlations and categorize them, and provide insights regarding their implications with respect to performance and usability. In this thesis, a comparative study of Linearizability, Sequential Consistency, Quiescent Consistency and Quasi Linearizability will be presented using data structures like FIFO Queues, Stacks, and Priority Queues, and with a case study for performance of these implementations using different correctness criteria

    Quantifiability: Concurrent Correctness from First Principles

    Get PDF
    Architectural imperatives due to the slowing of Moore\u27s Law, the broad acceptance of relaxed semantics and the O(n!) worst case verification complexity of sequential histories motivate a new approach to concurrent correctness. Desiderata for a new correctness condition are that it be independent of sequential histories, compositional over objects, flexible as to timing, modular as to semantics and free of inherent locking or waiting. This dissertation proposes Quantifiability, a novel correctness condition based on intuitive first principles. Quantifiablity is formally defined with its system model. Useful properties of quantifiability such as compositionality, measurablility and observational refinement are demonstrated. Quantifiability models a system in vector space to launch a new mathematical analysis of concurrency. The vector space model is suitable for a wide range of concurrent systems and their associated data structures. Proof of correctness is facilitated with linear algebra, better supported and of more efficient time complexity than traditional combinatorial methods. Experimental results are presented showing that quantifiable data structures are highly scalable due to their use of relaxed semantics, an implementation trade-off that is explicitly permitted by quantifiability. The speedups attainable are theoretically analyzed. Because previous work lacked a metric for evaluating such trade-offs, a new measure is proposed here that applies communication theory to the disordered results of concurrent data structures. This entropy measure opens the way to analyze degrees of concurrent correctness across implementations to engineer system scalability and evaluate data structure quality under different workloads. With all its innovation, quantifiability is presented the context of previous work and existing correctness conditions

    Correctness and Progress Verification of Non-Blocking Programs

    Get PDF
    The progression of multi-core processors has inspired the development of concurrency libraries that guarantee safety and liveness properties of multiprocessor applications. The difficulty of reasoning about safety and liveness properties in a concurrent environment has led to the development of tools to verify that a concurrent data structure meets a correctness condition or progress guarantee. However, these tools possess shortcomings regarding the ability to verify a composition of data structure operations. Additionally, verification techniques for transactional memory evaluate correctness based on low-level read/write histories, which is not applicable to transactional data structures that use a high-level semantic conflict detection. In my dissertation, I present tools for checking the correctness of multiprocessor programs that overcome the limitations of previous correctness verification techniques. Correctness Condition Specification (CCSpec) is the first tool that automatically checks the correctness of a composition of concurrent multi-container operations performed in a non-atomic manner. Transactional Correctness tool for Abstract Data Types (TxC-ADT) is the first tool that can check the correctness of transactional data structures. TxC-ADT elevates the standard definitions of transactional correctness to be in terms of an abstract data type, an essential aspect for checking correctness of transactions that synchronize only for high-level semantic conflicts. Many practical concurrent data structures, transactional data structures, and algorithms to facilitate non-blocking programming all incorporate helping schemes to ensure that an operation comprising multiple atomic steps is completed according to the progress guarantee. The helping scheme introduces additional interference by the active threads in the system to achieve the designed progress guarantee. Previous progress verification techniques do not accommodate loops whose termination is dependent on complex behaviors of the interfering threads, making these approaches unsuitable. My dissertation presents the first progress verification technique for non-blocking algorithms that are dependent on descriptor-based helping mechanisms

    Infrastructure for Performance Monitoring and Analysis of Systems and Applications

    Get PDF
    The growth of High Performance Computer (HPC) systems increases the complexity with respect to understanding resource utilization, system management, and performance issues. HPC performance monitoring tools need to collect information at both the application and system levels to yield a complete performance picture. Existing approaches limit the abilities of the users to do meaningful analysis on actionable timescale. Efficient infrastructures are required to support largescale systems performance data analysis for both run-time troubleshooting and post-run processing modes. In this dissertation, we present methods to fill these gaps in the infrastructure for HPC performance monitoring and analysis. First, we enhance the architecture of a monitoring system to integrate streaming analysis capabilities at arbitrary locations within its data collection, transport, and aggregation facilities. Next, we present an approach to streaming collection of application performance data. We integrate these methods with a monitoring system used on large-scale computational platforms. Finally, we present a new approach for constructing durable transactional linked data structures that takes advantage of byte-addressable non-volatile memory technologies. Transactional data structures are building blocks of in-memory databases that are used by HPC monitoring systems to store and retrieve data efficiently. We evaluate the presented approaches on a series of case studies. The experiment results demonstrate the impact of our tools, while keeping the overhead in an acceptable margin
    corecore