738 research outputs found

    Insights into the Fallback Path of Best-Effort Hardware Transactional Memory Systems

    Get PDF
    DOI 10.1007/978-3-319-43659-3Current industry proposals for Hardware Transactional Memory (HTM) focus on best-effort solutions (BE-HTM) where hardware limits are imposed on transactions. These designs may show a significant performance degradation due to high contention scenarios and different hardware and operating system limitations that abort transactions, e.g. cache overflows, hardware and software exceptions, etc. To deal with these events and to ensure forward progress, BE-HTM systems usually provide a software fallback path to execute a lock-based version of the code. In this paper, we propose a hardware implementation of an irrevocability mechanism as an alternative to the software fallback path to gain insight into the hardware improvements that could enhance the execution of such a fallback. Our mechanism anticipates the abort that causes the transaction serialization, and stalls other transactions in the system so that transactional work loss is mini- mized. In addition, we evaluate the main software fallback path approaches and propose the use of ticket locks that hold precise information of the number of transactions waiting to enter the fallback. Thus, the separation of transactional and fallback execution can be achieved in a precise manner. The evaluation is carried out using the Simics/GEMS simulator and the complete range of STAMP transactional suite benchmarks. We obtain significant performance benefits of around twice the speedup and an abort reduction of 50% over the software fallback path for a number of benchmarks.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Energy-efficient and high-performance lock speculation hardware for embedded multicore systems

    Full text link
    Embedded systems are becoming increasingly common in everyday life and like their general-purpose counterparts, they have shifted towards shared memory multicore architectures. However, they are much more resource constrained, and as they often run on batteries, energy efficiency becomes critically important. In such systems, achieving high concurrency is a key demand for delivering satisfactory performance at low energy cost. In order to achieve this high concurrency, consistency across the shared memory hierarchy must be accomplished in a cost-effective manner in terms of performance, energy, and implementation complexity. In this article, we propose Embedded-Spec, a hardware solution for supporting transparent lock speculation, without the requirement for special supporting instructions. Using this approach, we evaluate the energy consumption and performance of a suite of benchmarks, exploring a range of contention management and retry policies. We conclude that for resource-constrained platforms, lock speculation can provide real benefits in terms of improved concurrency and energy efficiency, as long as the underlying hardware support is carefully configured.This work is supported in part by NSF under Grants CCF-0903384, CCF-0903295, CNS-1319495, and CNS-1319095 as well the Semiconductor Research Corporation under grant number 1983.001. (CCF-0903384 - NSF; CCF-0903295 - NSF; CNS-1319495 - NSF; CNS-1319095 - NSF; 1983.001 - Semiconductor Research Corporation

    Design and Implementation of Real-Time Transactional Memory

    Get PDF
    Abstract—Transactional memory is a promising, optimistic synchronization mechanism for chip-multiprocessor systems. The simplicity of atomic sections, instead of using explicit locks, is also appealing for real-time systems. In this paper an implementation of real-time transactional memory (RTTM) in the context of a real-time Java chip-multiprocessor (CMP) is presented. To provide a predictable and analyzable solution of transactional memory, the transaction buffer is organized fully associative. Evaluation in an FPGA shows that an associativity of up to 64-way is possible without degrading the overall system performance. The paper presents synthesis results for different RTTM configurations and different number of processor cores in the CMP system. A CMP system with up to 8 processor cores with RTTM support is feasible in an Altera Cyclone-II FPGA
    • …
    corecore