1,839 research outputs found
Bridging the Gap between Programming Languages and Hardware Weak Memory Models
We develop a new intermediate weak memory model, IMM, as a way of
modularizing the proofs of correctness of compilation from concurrent
programming languages with weak memory consistency semantics to mainstream
multi-core architectures, such as POWER and ARM. We use IMM to prove the
correctness of compilation from the promising semantics of Kang et al. to POWER
(thereby correcting and improving their result) and ARMv7, as well as to the
recently revised ARMv8 model. Our results are mechanized in Coq, and to the
best of our knowledge, these are the first machine-verified compilation
correctness results for models that are weaker than x86-TSO
Synchronising C/C++ and POWER
Shared memory concurrency relies on synchronisation primitives: compare-and-swap, load-reserve/store-conditional (aka LL/SC), language-level mutexes, and so on. In a sequentially consistent setting, or even in the TSO setting of x86 and Sparc, these have well-understood semantics. But in the very relaxed settings of IBM®, POWER®, ARM, or C/C++, it remains surprisingly unclear exactly what the programmer can depend on.
This paper studies relaxed-memory synchronisation. On the hardware side, we give a clear semantic characterisation of the load-reserve/store-conditional primitives as provided by POWER multiprocessors, for the first time since they were introduced 20 years ago; we cover their interaction with relaxed loads, stores, barriers, and dependencies. Our model, while not officially sanctioned by the vendor, is validated by extensive testing, comparing actual implementation behaviour against an oracle generated from the model, and by detailed discussion with IBM staff. We believe the ARM semantics to be similar.
On the software side, we prove sound a proposed compilation scheme of the C/C++ synchronisation constructs to POWER, including C/C++ spinlock mutexes, fences, and read-modify-write operations, together with the simpler atomic operations for which soundness is already known from our previous work; this is a first step in verifying concurrent algorithms that use load-reserve/store-conditional with respect to a realistic semantics. We also build confidence in the C/C++ model in its own terms, fixing some omissions and contributing to the C standards committee adoption of the C++11 concurrency model
TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA
Memory consistency models (MCMs) which govern inter-module interactions in a
shared memory system, are a significant, yet often under-appreciated, aspect of
system design. MCMs are defined at the various layers of the hardware-software
stack, requiring thoroughly verified specifications, compilers, and
implementations at the interfaces between layers. Current verification
techniques evaluate segments of the system stack in isolation, such as proving
compiler mappings from a high-level language (HLL) to an ISA or proving
validity of a microarchitectural implementation of an ISA.
This paper makes a case for full-stack MCM verification and provides a
toolflow, TriCheck, capable of verifying that the HLL, compiler, ISA, and
implementation collectively uphold MCM requirements. The work showcases
TriCheck's ability to evaluate a proposed ISA MCM in order to ensure that each
layer and each mapping is correct and complete. Specifically, we apply TriCheck
to the open source RISC-V ISA, seeking to verify accurate, efficient, and legal
compilations from C11. We uncover under-specifications and potential
inefficiencies in the current RISC-V ISA documentation and identify possible
solutions for each. As an example, we find that a RISC-V-compliant
microarchitecture allows 144 outcomes forbidden by C11 to be observed out of
1,701 litmus tests examined. Overall, this paper demonstrates the necessity of
full-stack verification for detecting MCM-related bugs in the hardware-software
stack.Comment: Proceedings of the Twenty-Second International Conference on
Architectural Support for Programming Languages and Operating System
Clarifying and compiling C/C++ concurrency: from C++11 to POWER
The upcoming C and C++ revised standards add concurrency to the languages, for the first time, in the form of a subtle *relaxed memory model* (the *C++11 model*). This aims to permit compiler optimisation and to accommodate the differing relaxed-memory behaviours of mainstream multiprocessors, combining simple semantics for most code with high-performance *low-level atomics* for concurrency libraries. In this paper, we first establish two simpler but provably equivalent models for C++11, one for the full language and another for the subset without consume operations. Subsetting further to the fragment without low-level atomics, we identify a subtlety arising from atomic initialisation and prove that, under an additional condition, the model is equivalent to sequential consistency for race-free programs
CheckFence: Checking Consistency of Concurrent Data Types on Relaxed Memory Models
Concurrency libraries can facilitate the development of multithreaded programs by providing concurrent implementations of familiar data types such as queues or sets. There exist many optimized algorithms that can achieve superior performance on multiprocessors by allowing concurrent data accesses without using locks. Unfortunately, such algorithms can harbor subtle concurrency bugs. Moreover, they require memory ordering fences to function correctly on relaxed memory models. To address these difficulties, we propose a verification approach that can exhaustively check all concurrent executions of a given test program on a relaxed memory model and can verify that they are observationally equivalent to a sequential execution. Our Check- Fence prototype automatically translates the C implementation code and the test program into a SAT formula, hands the latter to a standard SAT solver, and constructs counterexample traces if there exist incorrect executions. Applying CheckFence to five previously published algorithms, we were able to (1) find several bugs (some not previously known), and (2) determine how to place memory ordering fences for relaxed memory models
A wide-spectrum language for verification of programs on weak memory models
Modern processors deploy a variety of weak memory models, which for
efficiency reasons may (appear to) execute instructions in an order different
to that specified by the program text. The consequences of instruction
reordering can be complex and subtle, and can impact on ensuring correctness.
Previous work on the semantics of weak memory models has focussed on the
behaviour of assembler-level programs. In this paper we utilise that work to
extract some general principles underlying instruction reordering, and apply
those principles to a wide-spectrum language encompassing abstract data types
as well as low-level assembler code. The goal is to support reasoning about
implementations of data structures for modern processors with respect to an
abstract specification.
Specifically, we define an operational semantics, from which we derive some
properties of program refinement, and encode the semantics in the rewriting
engine Maude as a model-checking tool. The tool is used to validate the
semantics against the behaviour of a set of litmus tests (small assembler
programs) run on hardware, and also to model check implementations of data
structures from the literature against their abstract specifications
Lasagne : a static binary translator for weak memory model architectures
Funding: This work was supported by a UK RISE Grant.The emergence of new architectures create a recurring challenge to ensure that existing programs still work on them. Manually porting legacy code is often impractical. Static binary translation (SBT) is a process where a program’s binary is automatically translated from one architecture to another, while preserving their original semantics. However, these SBT tools have limited support to various advanced architectural features. Importantly, they are currently unable to translate concurrent binaries. The main challenge arises from the mismatches of the memory consistency model specified by the different architectures, especially when porting existing binaries to a weak memory model architecture. In this paper, we propose Lasagne, an end-to-end static binary translator with precise translation rules between x86 and Arm concurrency semantics. First, we propose a concurrency model for Lasagne’s intermediate representation (IR) and formally proved mappings between the IR and the two architectures. The memory ordering is preserved by introducing fences in the translated code. Finally, we propose optimizations focused on raising the level of abstraction of memory address calculations and reducing the number offences. Our evaluation shows that Lasagne reduces the number of fences by up to about 65%, with an average reduction of 45.5%, significantly reducing their runtime overhead.Postprin
A Wait-free Multi-word Atomic (1,N) Register for Large-scale Data Sharing on Multi-core Machines
We present a multi-word atomic (1,N) register for multi-core machines
exploiting Read-Modify-Write (RMW) instructions to coordinate the writer and
the readers in a wait-free manner. Our proposal, called Anonymous Readers
Counting (ARC), enables large-scale data sharing by admitting up to
concurrent readers on off-the-shelf 64-bits machines, as opposed to the most
advanced RMW-based approach which is limited to 58 readers. Further, ARC avoids
multiple copies of the register content when accessing it---this affects
classical register's algorithms based on atomic read/write operations on single
words. Thus it allows for higher scalability with respect to the register size.
Moreover, ARC explicitly reduces improves performance via a proper limitation
of RMW instructions in case of read operations, and by supporting constant time
for read operations and amortized constant time for write operations. A proof
of correctness of our register algorithm is also provided, together with
experimental data for a comparison with literature proposals. Beyond assessing
ARC on physical platforms, we carry out as well an experimentation on
virtualized infrastructures, which shows the resilience of wait-free
synchronization as provided by ARC with respect to CPU-steal times, proper of
more modern paradigms such as cloud computing.Comment: non
- …