638 research outputs found

    Lock-free Concurrent Data Structures

    Full text link
    Concurrent data structures are the data sharing side of parallel programming. Data structures give the means to the program to store data, but also provide operations to the program to access and manipulate these data. These operations are implemented through algorithms that have to be efficient. In the sequential setting, data structures are crucially important for the performance of the respective computation. In the parallel programming setting, their importance becomes more crucial because of the increased use of data and resource sharing for utilizing parallelism. The first and main goal of this chapter is to provide a sufficient background and intuition to help the interested reader to navigate in the complex research area of lock-free data structures. The second goal is to offer the programmer familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computin

    Performance Analysis and Modelling of Concurrent Multi-access Data Structures

    Get PDF
    The major impediment to scaling concurrent data structures is memory contention when accessing shared data structure access-points, leading to thread serialisation, hindering parallelism. Aiming to address this challenge, significant amount of work in the literature has proposed multi-access techniques that improve concurrent data structure parallelism. However, there is little work on analysing and modelling the execution behaviour of concurrent multi-access data structures especially in a shared memory setting. In this paper, we analyse and model the general execution behaviour of concurrent multi-access data structures in the shared memory setting. We study and analyse the behaviour of the two popular random access patterns: shared (Remote) and exclusive (Local) access, and the behaviour of the two most commonly used atomic primitives for designing lock-free data structures: Compare and Swap, and, Fetch and Add. We model the concurrent multi-accesses by splitting the thread execution procedure into five logical sessions: i) side-work, ii) access-point search iii) access-point acquisition, iv) access-point data acquisition and v) access-point data operation. We model the acquisition of an access-point, as a system of closed queuing networks with parallel servers, and data acquisition in terms of where the data is located within the memory system. We evaluate our model on a set of concurrent data structure designs including a counter, a stack and a FIFO queue. The evaluation is carried out on two state of the art multi-core processors: Intel Xeon Phi CPU 7290 with 72 physical cores and Intel Xeon E5-2695 with 14 physical cores. Our model is able to predict the throughput performance of the given concurrent data structures with 80% to 100% accuracy on both architectures

    Online machine learning approach to multicore data structures

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 175-180).As multicores become prevalent, the complexity of programming is skyrocketing. One major difficulty is eciently orchestrating collaboration among threads through shared data structures. Unfortunately, choosing and hand-tuning data structure algorithms to get good performance across a variety of machines and inputs is a herculean task to add to the fundamental difficulty of getting a parallel program correct. To help mitigate these complexities, this work develops a new class of parallel data structures called Smart Data Structures that leverage online machine learning to adapt themselves automatically. We prototype and evaluate an open source library of Smart Data Structures for common parallel programming needs and demonstrate signicant improvements over the best existing algorithms under a variety of conditions. Our results indicate that learning is a promising technique for balancing and adapting to complex, time-varying tradeoffs and achieving the best performance available.by Jonathan M. Eastep.Ph.D

    High-Performance Composable Transactional Data Structures

    Get PDF
    Exploiting the parallelism in multiprocessor systems is a major challenge in the post ``power wall\u27\u27 era. Programming for multicore demands a change in the way we design and use fundamental data structures. Concurrent data structures allow scalable and thread-safe accesses to shared data. They provide operations that appear to take effect atomically when invoked individually. A main obstacle to the practical use of concurrent data structures is their inability to support composable operations, i.e., to execute multiple operations atomically in a transactional manner. The problem stems from the inability of concurrent data structure to ensure atomicity of transactions composed from operations on a single or multiple data structures instances. This greatly hinders software reuse because users can only invoke data structure operations in a limited number of ways. Existing solutions, such as software transactional memory (STM) and transactional boosting, manage transaction synchronization in an external layer separated from the data structure\u27s own thread-level concurrency control. Although this reduces programming effort, it leads to significant overhead associated with additional synchronization and the need to rollback aborted transactions. In this dissertation, I address the practicality and efficiency concerns by designing, implementing, and evaluating high-performance transactional data structures that facilitate the development of future highly concurrent software systems. Firstly, I present two methodologies for implementing high-performance transactional data structures based on existing concurrent data structures using either lock-based or lock-free synchronizations. For lock-based data structures, the idea is to treat data accessed by multiple operations as resources. The challenge is for each thread to acquire exclusive access to desired resources while preventing deadlock or starvation. Existing locking strategies, like two-phase locking and resource hierarchy, suffer from performance degradation under heavy contention, while lacking a desirable fairness guarantee. To overcome these issues, I introduce a scalable lock algorithm for shared-memory multiprocessors addressing the resource allocation problem. It is the first multi-resource lock algorithm that guarantees the strongest first-in, first-out (FIFO) fairness. For lock-free data structures, I present a methodology for transforming them into high-performance lock-free transactional data structures without revamping the data structures\u27 original synchronization design. My approach leverages the semantic knowledge of the data structure to eliminate the overhead of false conflicts and rollbacks. Secondly, I apply the proposed methodologies and present a suite of novel transactional search data structures in the form of an open source library. This is interesting not only because the fundamental importance of search data structures in computer science and their wide use in real world programs, but also because it demonstrate the implementation issues that arise when using the methodologies I have developed. This library is not only a compilation of a large number of fundamental data structures for multiprocessor applications, but also a framework for enabling composable transactions, and moreover, an infrastructure for continuous integration of new data structures. By taking such a top-down approach, I am able to identify and consider the interplay of data structure interface operations as a whole, which allows for scrutinizing their commutativity rules and hence opens up possibilities for design optimizations. Lastly, I evaluate the throughput of the proposed data structures using transactions with randomly generated operations on two difference hardware systems. To ensure the strongest possible competition, I chose the best performing alternatives from state-of-the-art locking protocols and transactional memory systems in the literature. The results show that it is straightforward to build efficient transactional data structures when using my multi-resource lock as a drop-in replacement for transactional boosted data structures. Furthermore, this work shows that it is possible to build efficient lock-free transactional data structures with all perceived benefits of lock-freedom and with performance far better than generic transactional memory systems

    On packet switch design

    Get PDF

    Revisiting Actor Programming in C++

    Full text link
    The actor model of computation has gained significant popularity over the last decade. Its high level of abstraction makes it appealing for concurrent applications in parallel and distributed systems. However, designing a real-world actor framework that subsumes full scalability, strong reliability, and high resource efficiency requires many conceptual and algorithmic additives to the original model. In this paper, we report on designing and building CAF, the "C++ Actor Framework". CAF targets at providing a concurrent and distributed native environment for scaling up to very large, high-performance applications, and equally well down to small constrained systems. We present the key specifications and design concepts---in particular a message-transparent architecture, type-safe message interfaces, and pattern matching facilities---that make native actors a viable approach for many robust, elastic, and highly distributed developments. We demonstrate the feasibility of CAF in three scenarios: first for elastic, upscaling environments, second for including heterogeneous hardware like GPGPUs, and third for distributed runtime systems. Extensive performance evaluations indicate ideal runtime behaviour for up to 64 cores at very low memory footprint, or in the presence of GPUs. In these tests, CAF continuously outperforms the competing actor environments Erlang, Charm++, SalsaLite, Scala, ActorFoundry, and even the OpenMPI.Comment: 33 page

    White-box methodologies, programming abstractions and libraries

    Get PDF
    EXCESS deliverable D2.2. More information at http://www.excess-project.eu/This deliverable reports the results of white-box methodologies and early results ofthe first prototype of libraries and programming abstractions as available by projectmonth 18 by Work Package 2 (WP2). It reports i) the latest results of Task 2.2on white-box methodologies, programming abstractions and libraries for developingenergy-efficient data structures and algorithms and ii) the improved results of Task2.1 on investigating and modeling the trade-off between energy and performance ofconcurrent data structures and algorithms. The work has been conducted on two mainEXCESS platforms: Intel platforms with recent Intel multicore CPUs and MovidiusMyriad1 platform

    The parallel event loop model and runtime: a parallel programming model and runtime system for safe event-based parallel programming

    Get PDF
    Recent trends in programming models for server-side development have shown an increasing popularity of event-based single- threaded programming models based on the combination of dynamic languages such as JavaScript and event-based runtime systems for asynchronous I/O management such as Node.JS. Reasons for the success of such models are the simplicity of the single-threaded event-based programming model as well as the growing popularity of the Cloud as a deployment platform for Web applications. Unfortunately, the popularity of single-threaded models comes at the price of performance and scalability, as single-threaded event-based models present limitations when parallel processing is needed, and traditional approaches to concurrency such as threads and locks don't play well with event-based systems. This dissertation proposes a programming model and a runtime system to overcome such limitations by enabling single-threaded event-based applications with support for speculative parallel execution. The model, called Parallel Event Loop, has the goal of bringing parallel execution to the domain of single-threaded event-based programming without relaxing the main characteristics of the single-threaded model, and therefore providing developers with the impression of a safe, single-threaded, runtime. Rather than supporting only pure single-threaded programming, however, the parallel event loop can also be used to derive safe, high-level, parallel programming models characterized by a strong compatibility with single-threaded runtimes. We describe three distinct implementations of speculative runtimes enabling the parallel execution of event-based applications. The first implementation we describe is a pessimistic runtime system based on locks to implement speculative parallelization. The second and the third implementations are based on two distinct optimistic runtimes using software transactional memory. Each of the implementations supports the parallelization of applications written using an asynchronous single-threaded programming style, and each of them enables applications to benefit from parallel execution
    corecore