Search CORE

35 research outputs found

Is SC+ILP=RC?

Author: Falsafi Babak
Guiady Chris
Vijaykumar T. N.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/04/2009
Field of study

Sequential consistency (SC) is the simplest programming interface for shared-memory systems but imposes program order among all memory operations, possibly precluding high performance implementations. Release consistency (RC), however, enables the highest performance implementations but puts the burden on the programmer to specify which memory operations need to be atomic and in program order. This paper shows, for the first time, that SC implementations can perform as well as RC implementations if the hardware provides enough support for speculation. Both SC and RC implementations rely on reordering and overlapping memory operations for high performance. To enforce order when necessary, an RC implementation uses software guarantees, whereas an SC implementation relies on hardware speculation. Our SC implementation, called SC++, closes the performance gap because: (1) the hardware allows not just loads, as some current SC implementations do, but also stores to bypass each other speculatively to hide remote latencies, (2) the hardware provides large speculative state for not just processor, as previously proposed, but also memory to allow out-of- order memory operations, (3) the support for hardware speculation does not add excessive overheads to processor pipeline critical paths, and (4) well- behaved applications incur infrequent rollbacks of speculative execution. Using simulation, we show that SC++ achieves an RC implementation's performance in all the six applications we studie

Infoscience - École polytechnique fédérale de Lausanne

Improved Sequence-Based Speculation Techniques for Implementing Memory Consistency

Author: Blundell Colin
Martin Milo M.K.
Wenisch Tom
Publication venue: ScholarlyCommons
Publication date: 27/05/2008
Field of study

This work presents BMW, a new design for speculative implementations of memory consistency models in shared-memory multiprocessors. BMW obtains the same performance as prior proposals, but achieves this performance while avoiding several undesirable attributes of prior proposals: non-scalable structures, per-word valid bits in the data cache, modifications to the cache coherence protocol, and global arbitration. BMW uses a read and write bit per cache block and a standard invalidation-based cache coherence protocol to perform conflict detection while speculating. While speculating, stores to block not in the cache are placed into a coalescing store buffer until those misses return. Stores are written speculatively to the primary cache, and non-speculative state is maintained by cleaning dirty blocks before being written speculatively. Speculative blocks are invalidated on abort and marked as non-speculative on commit. This organization allows for fast, local commits while avoiding a non-scalable store queue

ScholarlyCommons@Penn

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA

Author: Collier William W.
Gharachorloo Kourosh
International SPARC
J.
Jackson Daniel
Petri Gustavo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Memory consistency models (MCMs) which govern inter-module interactions in a shared memory system, are a significant, yet often under-appreciated, aspect of system design. MCMs are defined at the various layers of the hardware-software stack, requiring thoroughly verified specifications, compilers, and implementations at the interfaces between layers. Current verification techniques evaluate segments of the system stack in isolation, such as proving compiler mappings from a high-level language (HLL) to an ISA or proving validity of a microarchitectural implementation of an ISA. This paper makes a case for full-stack MCM verification and provides a toolflow, TriCheck, capable of verifying that the HLL, compiler, ISA, and implementation collectively uphold MCM requirements. The work showcases TriCheck's ability to evaluate a proposed ISA MCM in order to ensure that each layer and each mapping is correct and complete. Specifically, we apply TriCheck to the open source RISC-V ISA, seeking to verify accurate, efficient, and legal compilations from C11. We uncover under-specifications and potential inefficiencies in the current RISC-V ISA documentation and identify possible solutions for each. As an example, we find that a RISC-V-compliant microarchitecture allows 144 outcomes forbidden by C11 to be observed out of 1,701 litmus tests examined. Overall, this paper demonstrates the necessity of full-stack verification for detecting MCM-related bugs in the hardware-software stack.Comment: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating System

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Recommended from our members

A Consistency Checking Optimization Algorithm for Memory-Intensive Transactions ; CU-CS-1049-08

Author: Connors Daniel
Gottschlich Justin
Siek Jeremy
Publication venue: CU Scholar
Publication date: 01/05/2008
Field of study

CU Scholar Institutional Repository

Fingerprinting: Bounding the Soft-Error Detection Latency and Bandwidth

Author: Falsafi Babak
Gold Brian T.
Hoe James C.
Kim Jangwoo
Nowatzyk Andreas G.
Smolens Jared C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/10/2007
Field of study

Fingerprinting summarizes the history of internal processor state updates into a cryptographic signature. The processors in a dual modular redundant pair periodically exchange and compare fingerprints to corroborate each other's correctness. relative to other techniques, fingerprinting offers superior error coverage and significantly reduces the error-detection latency and bandwidth

Infoscience - École polytechnique fédérale de Lausanne

Implicit transactional memory in chip multiprocessors

Author: Beivide Palacio Ramon
Cristal Kestelman Adrián
Galluzzi Marco
Smith James E.
Stenström Per
Valero Cortés Mateo
Vallejo Enrique
Vallejo Fernando
Publication venue
Publication date: 01/01/2007
Field of study

Chip Multiprocessors (CMPs) are an efficient way of designing and use the huge amount of transistors on a chip. Different cores on a chip can compose a shared memory system with a very low-latency interconnect at a very low cost. Unfortunately, consistency models and synchronization styles of popular programming models for multiprocessors impose severe performance losses. Known architectural approaches to combat these losses are too complex, too specialized, or not transparent to the software. In this article, we introduce “implicit transactional memory” as a generalized architectural concept to remove such performance losses. We show how the concept of implicit transactions can be implemented at a low complexity by leveraging the multi-checkpoint mechanism of the Kilo-Instruction Processor. By relying on a general speculation substrate, it supports even the strictest consistency model – sequential consistency – potentially as effectively as weaker models and it allows multiple threads to speculatively execute critical sections, beyond barriers and event synchronizations.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Implicit transactional memory in kilo-instruction multiprocessors

Author: Beivide Palacio Julio Ramon
Cristal Kestelman Adrián
Galluzzi Marco
Smith James E.
Stenström Per
Valero Cortés Mateo
Vallejo Enrique
Vallejo Fernando
Publication venue
Publication date: 01/01/2007
Field of study

Although they have been the main server technology for many years, multiprocessors are undergoing a renaissance due to multi-core chips and the attractive scalability properties of combining a number of such multi-core chips into a system. The widespread use of multiprocessor systems will make performance losses due to consistency models and synchronization styles of popular programming models even more evident than they already are. Known architectural approaches to combat these losses are generally too complex, too specialized, or not transparent to software. In this article, we introduce implicit transactional memory as a generalized architectural concept to remove unnecessary performance losses caused by consistency models and synchronization styles. We show how the concept of implicit transactions can be implemented with low complexity by leveraging the multi-checkpoint mechanism of the Kilo-Instruction Processor. By relying on a general speculation substrate, this method supports even the strictest consistency model – sequential consistency – potentially as effectively as weaker models and it allows multiple threads to speculatively execute critical sections, beyond barriers and event synchronizations.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

The Silently Shifting Semicolon

Author: Marino Daniel
Millstein Todd
Musuvathi Madanlal
Narayanasamy Satish
Singh Abhayendra
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 1st Summit on Advances in Programming Languages (SNAPL 2015)
Publication date: 01/01/2015
Field of study

Memory consistency models for modern concurrent languages have largely been designed from a system-centric point of view that protects, at all costs, optimizations that were originally designed for sequential programs. The result is a situation that, when viewed from a programmer\u27s standpoint, borders on absurd. We illustrate this unfortunate situation with a brief fable and then examine the opportunities to right our path

Dagstuhl Research Online Publication Server

Solving multiprocessor drawbacks with kilo-instruction processors

Author: Beivide Palacio Ramon
Cristal Kestelman Adrián
Galluzzi Marco
Smith James E.
Stenström Per
Valero Cortés Mateo
Vallejo Enrique
Vallejo Fernando
Publication venue
Publication date: 01/01/2005
Field of study

Nowadays, a good multiprocessor system design has to deal with many drawbacks in order to achieve a good tradeoff between complexity and performance. For example, while solving problems like coherence and consistency is essential for correctness the way to solve processor stalls due to critical sections and synchronization points is desirable for performance. And none of these drawbacks has a straightforward solution. We show in our paper how the multi-checkpointing mechanism of the Kilo-Instruction Processors can be correctly leveraged in order to achieve a good complexity-effective multiprocessor design. Specifically, we describe a Kilo-Instruction Multiprocessor that transparently, i.e. without any software support, uses transaction-based memory updates. Our model simplifies the coherence and consistency hardware and gives the potential for easily applying different desirable speculative mechanisms to enhance performance when facing some synchronization constructs of current parallel applications.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC