107 research outputs found

    BifurKTM: Approximately Consistent Distributed Transactional Memory for GPUs

    Get PDF
    We present BifurKTM, the first read-optimized Distributed Transactional Memory system for GPU clusters. The BifurKTM design includes: GPU KoSTM, a new software transactional memory conflict detection scheme that exploits relaxed consistency to increase throughput; and KoDTM, a Distributed Transactional Memory model that combines the Data- and Control- flow models to greatly reduce communication overheads. Despite the allure of huge speedups, GPUs are limited in use due to their programmability and extreme sensitivity to workload characteristics. These become daunting concerns when considering a distributed GPU cluster, wherein a programmer must design algorithms to hide communication latency by exploiting data regularity, high compute intensity, etc. The BifurKTM design allows GPU programmers to exploit a new workload characteristic: the percentage of the workload that is Read-Only (e.g. reads but does not modify shared memory), even when this percentage is not known in advance. Programmers designate transactions that are suitable for Approximate Consistency, in which transactions "appear" to execute at the most convenient time for preventing conflicts. By leveraging Approximate Consistency for Read-Only transactions, the BifurKTM runtime system offers improved performance, application flexibility, and programmability without introducing any errors into shared memory. Our experiments show that Approximate Consistency can improve BkTM performance by up to 34x in applications with moderate network communication utilization and a read-intensive workload. Using Approximate Consistency, BkTM can reduce GPU-to-GPU network communication by 99%, reduce the number of aborts by up to 100%, and achieve an average speedup of 18x over a similarly sized CPU cluster while requiring minimal effort from the programmer

    Using Nesting to Push the Limits of Transactional Data Structure Libraries

    Get PDF
    Transactional data structure libraries (TDSL) combine the ease-of-programming of transactions with the high performance and scalability of custom-tailored concurrent data structures. They can be very efficient thanks to their ability to exploit data structure semantics in order to reduce overhead, aborts, and wasted work compared to general-purpose software transactional memory. However, TDSLs were not previously used for complex use-cases involving long transactions and a variety of data structures. In this paper, we boost the performance and usability of a TDSL, towards allowing it to support complex applications. A key idea is nesting. Nested transactions create checkpoints within a longer transaction, so as to limit the scope of abort, without changing the semantics of the original transaction. We build a Java TDSL with built-in support for nested transactions over a number of data structures. We conduct a case study of a complex network intrusion detection system that invests a significant amount of work to process each packet. Our study shows that our library outperforms publicly available STMs twofold without nesting, and by up to 16x when nesting is used

    Fairness and Approximation in Multi-version Transactional Memory.

    Get PDF
    Shared memory multi-core systems bene_x000C_t from transactional memory implementations due to the inherent avoidance of deadlocks and progress guarantees. In this research, we examine how the system performance is a_x000B_ected by transaction fairness in scheduling and by the precision in consistency. We _x000C_rst explore the fairness aspect using a Lazy Snapshot (multi-version) Algorithm. The fairness of transactions scheduling aims to balance the load between read-only and update transactions. We implement a fairness mechanism based on machine learning techniques that improve fairness decisions according to the transaction execution history. Experimental analysis shows that the throughput of the Lazy Snapshot Algorithm is improved with machine learning support. We also explore the impacts on performance of consistency relaxation. In transactional memory, correctness is typically proven with opacity which is a precise consistency property that requires a legal serialization of an execution such that transactions do not overlap (atomicity) and read instructions always return the most recent value (legality). In real systems there are situations where system delays do not allow precise consistency, such as in large scale applications, due to network or other time delays. Thus, we introduce here the notion of approximate consistency in transactional memory. We de_x000C_ne K-opacity as a relaxed consistency property where transactions\u27 read operations may return one of K most recent written values. In multi-version transactional memory, this allows to save a new object version once every K object updates, which has two bene_x000C_ts: (i) it reduces space requirements by a factor of K, and (ii) it reduces the number of aborts, since there is smaller chance for con icts. In fact, we apply the concept of K-opacity on regular read and write, count and queue objects, which are common objects used in typical concurrent programs. We provide formal correctness proofs and we also demonstrate the performance bene_x000C_ts of our approach with experimental analysis. We compare the performance of precise consistent execution (1-opaque) with di_x000B_erent consistency values of K using micro benchmarks. The results show that increased relaxation of opacity gives higher throughput and decreases the aborts rate signi_x000C_cantly

    The parallel event loop model and runtime: a parallel programming model and runtime system for safe event-based parallel programming

    Get PDF
    Recent trends in programming models for server-side development have shown an increasing popularity of event-based single- threaded programming models based on the combination of dynamic languages such as JavaScript and event-based runtime systems for asynchronous I/O management such as Node.JS. Reasons for the success of such models are the simplicity of the single-threaded event-based programming model as well as the growing popularity of the Cloud as a deployment platform for Web applications. Unfortunately, the popularity of single-threaded models comes at the price of performance and scalability, as single-threaded event-based models present limitations when parallel processing is needed, and traditional approaches to concurrency such as threads and locks don't play well with event-based systems. This dissertation proposes a programming model and a runtime system to overcome such limitations by enabling single-threaded event-based applications with support for speculative parallel execution. The model, called Parallel Event Loop, has the goal of bringing parallel execution to the domain of single-threaded event-based programming without relaxing the main characteristics of the single-threaded model, and therefore providing developers with the impression of a safe, single-threaded, runtime. Rather than supporting only pure single-threaded programming, however, the parallel event loop can also be used to derive safe, high-level, parallel programming models characterized by a strong compatibility with single-threaded runtimes. We describe three distinct implementations of speculative runtimes enabling the parallel execution of event-based applications. The first implementation we describe is a pessimistic runtime system based on locks to implement speculative parallelization. The second and the third implementations are based on two distinct optimistic runtimes using software transactional memory. Each of the implementations supports the parallelization of applications written using an asynchronous single-threaded programming style, and each of them enables applications to benefit from parallel execution

    The parallel event loop model and runtime: a parallel programming model and runtime system for safe event-based parallel programming

    Get PDF
    Recent trends in programming models for server-side development have shown an increasing popularity of event-based single- threaded programming models based on the combination of dynamic languages such as JavaScript and event-based runtime systems for asynchronous I/O management such as Node.JS. Reasons for the success of such models are the simplicity of the single-threaded event-based programming model as well as the growing popularity of the Cloud as a deployment platform for Web applications. Unfortunately, the popularity of single-threaded models comes at the price of performance and scalability, as single-threaded event-based models present limitations when parallel processing is needed, and traditional approaches to concurrency such as threads and locks don't play well with event-based systems. This dissertation proposes a programming model and a runtime system to overcome such limitations by enabling single-threaded event-based applications with support for speculative parallel execution. The model, called Parallel Event Loop, has the goal of bringing parallel execution to the domain of single-threaded event-based programming without relaxing the main characteristics of the single-threaded model, and therefore providing developers with the impression of a safe, single-threaded, runtime. Rather than supporting only pure single-threaded programming, however, the parallel event loop can also be used to derive safe, high-level, parallel programming models characterized by a strong compatibility with single-threaded runtimes. We describe three distinct implementations of speculative runtimes enabling the parallel execution of event-based applications. The first implementation we describe is a pessimistic runtime system based on locks to implement speculative parallelization. The second and the third implementations are based on two distinct optimistic runtimes using software transactional memory. Each of the implementations supports the parallelization of applications written using an asynchronous single-threaded programming style, and each of them enables applications to benefit from parallel execution
    • …
    corecore