Search CORE

29,160 research outputs found

Mixing Hardware and Software Reversibility for Speculative Parallel Discrete Event Simulation

Author: Cingolani Davide
Ianni Mauro
Pellegrini Alessandro
Quaglia Francesco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Speculative parallel discrete event simulation requires a support for reversing processed events, also called state recovery, when causal inconsistencies are revealed. In this article we present an approach where state recovery relies on a mix of hardware- and software-based techniques. We exploit the Hardware Transactional Memory (HTM) support, as offered by Intel Haswell CPUs, to process events as in-memory transactions, which are possibly committed only after their causal consistency is verified. At the same time, we exploit an innovative software-based reversibility technique, fully relying on transparent software instrumentation targeting x86/ELF objects, which enables undoing side effects by events with no actual backward re-computation. Each thread within our speculative processing engine dynamically (on a per-event basis) selects which recovery mode to rely on (hardware vs software) depending on varying runtime dynamics. The latter are captured by a lightweight analytic model indicating to what extent the HTM support (not paying any instrumentation cost) is efficient, and after what level of events’ parallelism it starts degrading its performance, e.g., due to excessive data conflicts while manipulating causality meta-data within HTM-based transactions. We released our implementation as open source software and provide experimental results for an assessment of its effectiveness. © Springer International Publishing Switzerland 2016

ART

Archivio della ricerca- Università di Roma La Sapienza

Cost-effective compiler directed memory prefetching and bypassing

Author: Ayguadé Parra Eduard
Baer Jean-Loup
Ortega Fernández Daniel
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

Ever increasing memory latencies and deeper pipelines push memory farther from the processor. Prefetching techniques aim is to bridge these two gaps by fetching data in advance to both the L1 cache and the register file. Our main contribution in this paper is a hybrid approach to the prefetching problem that combines both software and hardware prefetching in a cost-effective way by needing very little hardware support and impacting minimally the design of the processor pipeline. The prefetcher is built on-top of a static memory instruction bypassing, which is in charge of bringing prefetched values in the register file. In this paper we also present a thorough analysis of the limits of both prefetching and memory instruction bypassing. We also compare our prefetching technique with a prior speculative proposal that attacked the same problem, and we show that at much lower cost, our hybrid solution is better than a realistic implementation of speculative prefetching and bypassing. On average, our hybrid implementation achieves a 13% speed-up improvement over a version with software prefetching in a subset of numerical applications and an average of 43% over a version with no software prefetching (achieving up to a 102% for specific benchmarks).Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

HeTM: Transactional Memory for Heterogeneous Systems

Author: Castro Daniel
Ilic Aleksandar
Khan Amin M.
Romano Paolo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/09/2019
Field of study

Modern heterogeneous computing architectures, which couple multi-core CPUs with discrete many-core GPUs (or other specialized hardware accelerators), enable unprecedented peak performance and energy efficiency levels. Unfortunately, though, developing applications that can take full advantage of the potential of heterogeneous systems is a notoriously hard task. This work takes a step towards reducing the complexity of programming heterogeneous systems by introducing the abstraction of Heterogeneous Transactional Memory (HeTM). HeTM provides programmers with the illusion of a single memory region, shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with support for atomic transactions. Besides introducing the abstract semantics and programming model of HeTM, we present the design and evaluation of a concrete implementation of the proposed abstraction, which we named Speculative HeTM (SHeTM). SHeTM makes use of a novel design that leverages on speculative techniques and aims at hiding the inherently large communication latency between CPUs and discrete GPUs and at minimizing inter-device synchronization overhead. SHeTM is based on a modular and extensible design that allows for easily integrating alternative TM implementations on the CPU's and GPU's sides, which allows the flexibility to adopt, on either side, the TM implementation (e.g., in hardware or software) that best fits the applications' workload and the architectural characteristics of the processing unit. We demonstrate the efficiency of the SHeTM via an extensive quantitative study based both on synthetic benchmarks and on a porting of a popular object caching system.Comment: The current work was accepted in the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT'19

arXiv.org e-Print Archive

Crossref

TMbarrier: speculative barriers using hardware transactional memory

Author: Gutierrez-Carrasco Eladio Damian
Pedrero Luque Manuel
Plata-Gonzalez Oscar Guillermo
Publication venue
Publication date: 15/11/2018
Field of study

Barrier is a very common synchronization method used in parallel programming. Barriers are used typically to enforce a partial thread execution order, since there may be dependences between code sections before and after the barrier. This work proposes TMbarrier, a new design of a barrier intended to be used in transactional applications. TMbarrier allows threads to continue executing speculatively after the barrier assuming that there are not dependences with safe threads that have not yet reached the barrier. Our design leverages transactional memory (TM) (specifically, the implementation offered by the IBM POWER8 processor) to hold the speculative updates and to detect possible conflicts between speculative and safe threads. Despite the limitations of the best-effort hardware TM implementation present in current processors, experiments show a reduction in wasted time due to synchronization compared to standard barriers.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Crossref

Repositorio Institucional Universidad de Málaga

Energy-efficient and high-performance lock speculation hardware for embedded multicore systems

Author: Bahar R Iris
Capodanno Giuseppe
Herlihy Maurice
Moreshet Tali
Papagiannopoulou Dimitra
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2015
Field of study

Embedded systems are becoming increasingly common in everyday life and like their general-purpose counterparts, they have shifted towards shared memory multicore architectures. However, they are much more resource constrained, and as they often run on batteries, energy efficiency becomes critically important. In such systems, achieving high concurrency is a key demand for delivering satisfactory performance at low energy cost. In order to achieve this high concurrency, consistency across the shared memory hierarchy must be accomplished in a cost-effective manner in terms of performance, energy, and implementation complexity. In this article, we propose Embedded-Spec, a hardware solution for supporting transparent lock speculation, without the requirement for special supporting instructions. Using this approach, we evaluate the energy consumption and performance of a suite of benchmarks, exploring a range of contention management and retry policies. We conclude that for resource-constrained platforms, lock speculation can provide real benefits in terms of improved concurrency and energy efficiency, as long as the underlying hardware support is carefully configured.This work is supported in part by NSF under Grants CCF-0903384, CCF-0903295, CNS-1319495, and CNS-1319095 as well the Semiconductor Research Corporation under grant number 1983.001. (CCF-0903384 - NSF; CCF-0903295 - NSF; CNS-1319495 - NSF; CNS-1319095 - NSF; 1983.001 - Semiconductor Research Corporation

Boston University Institutional Repository (OpenBU)

ReduxSTM: Optimizing STM designs for Irregular Applications

Author: Gutierrez-Carrasco Eladio Damian
Pedrero Luque Manuel
Plata-Gonzalez Oscar Guillermo
Romero-Montiel Sergio
Publication venue: 'Elsevier BV'
Publication date: 15/11/2018
Field of study

Repositorio Institucional Universidad de Málaga