Search CORE

21,462 research outputs found

BifurKTM: Approximately Consistent Distributed Transactional Memory for GPUs

Author: Busch Costas
Irving Samuel
Peir Jih-Kwon
Peng Lu
Publication venue: OASIcs - OpenAccess Series in Informatics. 12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)
Publication date: 01/01/2021
Field of study

We present BifurKTM, the first read-optimized Distributed Transactional Memory system for GPU clusters. The BifurKTM design includes: GPU KoSTM, a new software transactional memory conflict detection scheme that exploits relaxed consistency to increase throughput; and KoDTM, a Distributed Transactional Memory model that combines the Data- and Control- flow models to greatly reduce communication overheads. Despite the allure of huge speedups, GPUs are limited in use due to their programmability and extreme sensitivity to workload characteristics. These become daunting concerns when considering a distributed GPU cluster, wherein a programmer must design algorithms to hide communication latency by exploiting data regularity, high compute intensity, etc. The BifurKTM design allows GPU programmers to exploit a new workload characteristic: the percentage of the workload that is Read-Only (e.g. reads but does not modify shared memory), even when this percentage is not known in advance. Programmers designate transactions that are suitable for Approximate Consistency, in which transactions "appear" to execute at the most convenient time for preventing conflicts. By leveraging Approximate Consistency for Read-Only transactions, the BifurKTM runtime system offers improved performance, application flexibility, and programmability without introducing any errors into shared memory. Our experiments show that Approximate Consistency can improve BkTM performance by up to 34x in applications with moderate network communication utilization and a read-intensive workload. Using Approximate Consistency, BkTM can reduce GPU-to-GPU network communication by 99%, reduce the number of aborts by up to 100%, and achieve an average speedup of 18x over a similarly sized CPU cluster while requiring minimal effort from the programmer

Dagstuhl Research Online Publication Server

Louisiana State University

TM2C: A software transactional memory for many-cores

Author: Gramoli V.
Guerraoui R.
Trigonakis V.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/05/2014
Field of study

Transactional memory is an appealing paradigm for concurrent programming. Many software implementations of the paradigm were proposed in the last decades for both shared memory multi-core systems and clusters of distributed machines. However, chip manufacturers have started producing many-core architectures, with low network-on-chip communication latency and limited support for cache-coherence, rendering existing transactional memory implementations inapplicable. This paper presents TM2C, the first software Transactional Memory protocol for Many-Core systems. TM2C exploits network-on-chip communications to get granted accesses to shared data through efficient message passing. In particular, it allows visible read accesses and hence effective distributed contention management with eager conflict detection. We also propose FairCM, a companion contention manager that ensures starvation-freedom, which we believe is an important property in many-core systems, as well as an implementation of elastic transactions in these settings. Our evaluation on four benchmarks, i.e., a linked list and a hash table data structures as well as a bank and a MapReduce-like applications, indicates better scalability than locks and up to 20-fold speedup (relative to bare sequential code) when running 24 application cores. © 2012 ACM

Infoscience - École polytechnique fédérale de Lausanne

Exploiting distributed software transactional memory

Author: Furber Stephen
Kirkham Christopher
Kirkham Christopher
Kotseldis Christos-Efthymios
Publication venue
Publication date: 01/01/2011
Field of study

Over the past years research and development on computer architecture has shifted from uni-processor systems to multi-core architectures. This transition has created new incentives in software development because in order for the software to scale it has to be highly parallel. Traditional synchronization primitives based on mutual exclusion locking are challenging to use and therefore are only efficiently employed by a minority of expert programmers. Transactional Memory (TM) is a new alternative parallel programming model aiming to alleviate the problems that arise from the use of explicit synchronization mechanisms. In TM, lock guarded code is replaced by memory transactions which comply with the ACI (atomicity, consistency, isolation) principles. The simplicity of the programming model that TM proposes has led to major research efforts by academia and industry to produce high-performance TM implementations. The majority of these TM systems, however, focus on shared-memory Chip MultiProcessors (CMPs) leaving the area of distributed systems unexplored. This thesis explores Transactional Memory in the distributed systems domain and more specifically on small-scale clusters. A variety of novel distributed transactional coherence protocols are proposed and evaluated, against complex TM oriented benchmarks, in the context of distributed Java Virtual Machines (JVMs) - an area that has received much attention over the last decade due to its perfect applicability into the enterprise domain. The implemented Distributed Software Transactional Memory (DiSTM) system, proposed in this thesis, is a JVM clustering solution that employs software transactional memory as its synchronization mechanism. Due to its modular design and ease in programming, it allows the addition of new protocols in a fairly easy manner. Finally, DiSTM is highly portable as it runs on top of off-the-shelf JVMs and requires no changes to existing Java source code.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository

Transparent support for partial rollback in software transactional memories

Author: Alice Porfirio
DI SANZO PIERANGELO
PELLEGRINI ALESSANDRO
QUAGLIA Francesco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The Software Transactional Memory (STM) paradigm has gained momentum thanks to its ability to provide synchronization transparency in concurrent applications. With this paradigm, accesses to data structures that are shared among multiple threads are carried out within transactions, which are properly handled by the STM layer with no intervention by the application code. In this article we propose an enhancement of typical STM architectures which allows supporting partial rollback of active transactions, as opposed to the typical case where a rollback of a transaction entails squashing all the already-performed work. Our partial rollback scheme is still transparent to the application programmer and has been implemented for x86-64 architectures and for the ELF format, thus being largely usable on POSIX-compliant systems hosted on top of off-the-shelf architectures. We integrated it within the TinySTM open-source library and we present experimental results for the STAMP STM benchmark run on top of a 32-core HP ProLiant server. © 2013 Springer-Verlag

Crossref

Archivio della Ricerca - Università di Roma 3

ART

Archivio della ricerca- Università di Roma La Sapienza

Adaptive Transactional Memories: Performance and Energy Consumption Tradeoffs

Author: DI SANZO Pierangelo
Pellegrini Alessandro
Rughetti Diego
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Energy efficiency is becoming a pressing issue, especially in large data centers where it entails, at the same time, a non-negligible management cost, an enhancement of hardware fault probability, and a significant environmental footprint. In this paper, we study how Software Transactional Memories (STM) can provide benefits on both power saving and the overall applications’ execution performance. This is related to the fact that encapsulating shared-data accesses within transactions gives the freedom to the STM middleware to both ensure consistency and reduce the actual data contention, the latter having been shown to affect the overall power needed to complete the application’s execution. We have selected a set of self-adaptive extensions to existing STM middlewares (namely, TinySTM and R-STM) to prove how self-adapting computation can capture the actual degree of parallelism and/or logical contention on shared data in a better way, enhancing even more the intrinsic benefits provided by STM. Of course, this benefit comes at a cost, which is the actual execution time required by the proposed approaches to precisely tune the execution parameters for reducing power consumption and enhancing execution performance. Nevertheless, the results hereby provided show that adaptivity is a strictly necessary requirement to reduce energy consumption in STM systems: Without it, it is not possible to reach any acceptable level of energy efficiency at all

Crossref

Archivio della Ricerca - Università di Roma 3

ART

Archivio della ricerca- Università di Roma La Sapienza

Maintaining consistency in distributed systems

Author: Birman Kenneth P.
Publication venue
Publication date: 01/01/1991
Field of study

In systems designed as assemblies of independently developed components, concurrent access to data or data structures normally arises within individual programs, and is controlled using mutual exclusion constructs, such as semaphores and monitors. Where data is persistent and/or sets of operation are related to one another, transactions or linearizability may be more appropriate. Systems that incorporate cooperative styles of distributed execution often replicate or distribute data within groups of components. In these cases, group oriented consistency properties must be maintained, and tools based on the virtual synchrony execution model greatly simplify the task confronting an application developer. All three styles of distributed computing are likely to be seen in future systems - often, within the same application. This leads us to propose an integrated approach that permits applications that use virtual synchrony with concurrent objects that respect a linearizability constraint, and vice versa. Transactional subsystems are treated as a special case of linearizability

CiteSeerX

NASA Technical Reports Server

eCommons@Cornell

HeTM: Transactional Memory for Heterogeneous Systems

Author: Castro Daniel
Ilic Aleksandar
Khan Amin M.
Romano Paolo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/09/2019
Field of study

Modern heterogeneous computing architectures, which couple multi-core CPUs with discrete many-core GPUs (or other specialized hardware accelerators), enable unprecedented peak performance and energy efficiency levels. Unfortunately, though, developing applications that can take full advantage of the potential of heterogeneous systems is a notoriously hard task. This work takes a step towards reducing the complexity of programming heterogeneous systems by introducing the abstraction of Heterogeneous Transactional Memory (HeTM). HeTM provides programmers with the illusion of a single memory region, shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with support for atomic transactions. Besides introducing the abstract semantics and programming model of HeTM, we present the design and evaluation of a concrete implementation of the proposed abstraction, which we named Speculative HeTM (SHeTM). SHeTM makes use of a novel design that leverages on speculative techniques and aims at hiding the inherently large communication latency between CPUs and discrete GPUs and at minimizing inter-device synchronization overhead. SHeTM is based on a modular and extensible design that allows for easily integrating alternative TM implementations on the CPU's and GPU's sides, which allows the flexibility to adopt, on either side, the TM implementation (e.g., in hardware or software) that best fits the applications' workload and the architectural characteristics of the processing unit. We demonstrate the efficiency of the SHeTM via an extensive quantitative study based both on synthetic benchmarks and on a porting of a popular object caching system.Comment: The current work was accepted in the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT'19

arXiv.org e-Print Archive

Crossref

Model-Based Proactive Read-Validation in Transaction Processing Systems

Author: Di Sanzo Pierangelo
Economo Simone
Pellegrini Alessandro
Quaglia Francesco
Silvestri Emiliano
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Concurrency control protocols based on read-validation schemes allow transactions which are doomed to abort to still run until a subsequent validation check reveals them as invalid. These late aborts do not favor the reduction of wasted computation and can penalize performance. To counteract this problem, we present an analytical model that predicts the abort probability of transactions handled via read-validation schemes. Our goal is to determine what are the suited points-along a transaction lifetime-to carry out a validation check. This may lead to early aborting doomed transactions, thus saving CPU time. We show how to exploit the abort probability predictions returned by the model in combination with a threshold-based scheme to trigger read-validations. We also show how this approach can definitely improve performance-leading up to 14 % better turnaround-as demonstrated by some experiments carried out with a port of the TPC-C benchmark to Software Transactional Memory

Crossref

Archivio della Ricerca - Università di Roma 3

ART

Archivio della ricerca- Università di Roma La Sapienza

Open Transactions on Shared Memory

Author: A. Mansutti
E. Vries de
E. Vries de
J. Gray
J. Ma
M. Abadi
M. Herlihy
N. Shavit
R. Bruni
R. Milner
V. Danos
V. Koutavas
X. Leroy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Transactional memory has arisen as a good way for solving many of the issues of lock-based programming. However, most implementations admit isolated transactions only, which are not adequate when we have to coordinate communicating processes. To this end, in this paper we present OCTM, an Haskell-like language with open transactions over shared transactional memory: processes can join transactions at runtime just by accessing to shared variables. Thus a transaction can co-operate with the environment through shared variables, but if it is rolled-back, also all its effects on the environment are retracted. For proving the expressive power of TCCS we give an implementation of TCCS, a CCS-like calculus with open transactions

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine