Search CORE

79 research outputs found

Clarifying and compiling C/C++ concurrency: from C++11 to POWER

Author: Batty Mark
Memarian Kayvan
Owens Scott
Sarkar Susmit
Sewell Peter
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

The upcoming C and C++ revised standards add concurrency to the languages, for the first time, in the form of a subtle *relaxed memory model* (the *C++11 model*). This aims to permit compiler optimisation and to accommodate the differing relaxed-memory behaviours of mainstream multiprocessors, combining simple semantics for most code with high-performance *low-level atomics* for concurrency libraries. In this paper, we first establish two simpler but provably equivalent models for C++11, one for the full language and another for the subset without consume operations. Subsetting further to the fragment without low-level atomics, we identify a subtlety arising from atomic initialisation and prove that, under an additional condition, the model is equivalent to sequential consistency for race-free programs

Crossref

Kent Academic Repository

University of St. Andrews - Pure

Polynomial-Time Fence Insertion for Structured Programs

Author: Lesani Mohsen
Pourdamghani Arash
Taheri Mohammad
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Distributed Computing (DISC 2019)
Publication date: 01/01/2019
Field of study

To enhance performance, common processors feature relaxed memory models that reorder instructions. However, the correctness of concurrent programs is often dependent on the preservation of the program order of certain instructions. Thus, the instruction set architectures offer memory fences. Using fences is a subtle task with performance and correctness implications: using too few can compromise correctness and using too many can hinder performance. Thus, fence insertion algorithms that given the required program orders can automatically find the optimum fencing can enhance the ease of programming, reliability, and performance of concurrent programs. In this paper, we consider the class of programs with structured branch and loop statements and present a greedy and polynomial-time optimum fence insertion algorithm. The algorithm incrementally reduces fence insertion for a control-flow graph to fence insertion for a set of paths. In addition, we show that the minimum fence insertion problem with multiple types of fence instructions is NP-hard even for straight-line programs

Dagstuhl Research Online Publication Server

Compositional Verification of Compiler Optimisations on Relaxed Memory

Author: D Distefano
D Jackson
Daniel Poetzl
J Alglave
J Ševčík
J Ševčík
M Batty
MP Herlihy
S Brookes
Viktor Vafeiadis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/02/2018
Field of study

This paper is about verifying program transformations on an axiomatic relaxed memory model of the kind used in C/C++ and Java. Relaxed models present particular challenges for verifying program transformations, because they generate many additional modes of interaction between code and context. For a block of code being transformed, we define a denotation from its behaviour in a set of representative contexts. Our denotation summarises interactions of the code block with the rest of the program both through local and global variables, and through subtle synchronisation effects due to relaxed memory. We can then prove that a transformation does not introduce new program behaviours by comparing the denotations of the code block before and after. Our approach is compositional: by examining only representative contexts, transformations are verified for any context. It is also fully abstract, meaning any valid transformation can be verified. We cover several tricky aspects of C/C++-style memory models, including release-acquire operations, sequentially consistent fences, and non-atomics. We also define a variant of our denotation that is finite at the cost of losing full abstraction. Based on this variant, we have implemented a prototype verification tool and ap

arXiv.org e-Print Archive

Crossref

Kent Academic Repository

Recommended from our members

A no-thin-air memory model for programming languages

Author: Pichon-Pharabod Jean Yves Alexis
Publication venue: University of Cambridge
Publication date: 28/03/2018
Field of study

Many hardware and compiler optimisations introduced to speed up single-threaded programs also introduce additional, sometimes surprising, behaviours for concurrent programs with shared mutable state. How many of these extra behaviours occur in practice depends on the combination of the hardware, compiler, runtime, etc. that make up the platform. A memory model, which prescribes what values each read of a concurrent program can read, allows programmers to determine whether a program behaves as expected without having to worry about the details of the platform. However, capturing these behaviours in a memory model without also including undesirable "out-of-thin-air" behaviours that do not occur in practice has proved elusive. The memory model of C and C++ allows out-of-thin-air behaviour, while the Java memory model fails to capture some behaviours that are introduced in practice by compiler optimisations. In this thesis, we propose a memory model that forbids out-of-thin-air behaviour, yet allows the behaviours that do occur. Our memory model follows operational intuitions of how the hardware and compilers operate. We illustrate that it behaves as desired on a series of litmus tests. We show that it captures at least some of the expected behaviours, that it forms an envelope around some common compiler optimisations, and that it is implementable on common hardware using the expected compilation schemes. We also show that it supports some established programming idioms.EPSRC Programme Grant REMS: Rigorous Engineering for Mainstream Systems, EP/K008628/1 Computer Laboratory Premium Research Studentshi

Apollo (Cambridge)

Lasagne : a static binary translator for weak memory model architectures

Author: Bhatotia Pramod
Chakraborty Soham
Fink Martin
Gouicem Redha
Rocha Rodrigo C. O.
Spink Tom
Sprokholt Dennis
Publication venue: ACM
Publication date: 01/01/2022
Field of study

Funding: This work was supported by a UK RISE Grant.The emergence of new architectures create a recurring challenge to ensure that existing programs still work on them. Manually porting legacy code is often impractical. Static binary translation (SBT) is a process where a program’s binary is automatically translated from one architecture to another, while preserving their original semantics. However, these SBT tools have limited support to various advanced architectural features. Importantly, they are currently unable to translate concurrent binaries. The main challenge arises from the mismatches of the memory consistency model specified by the different architectures, especially when porting existing binaries to a weak memory model architecture. In this paper, we propose Lasagne, an end-to-end static binary translator with precise translation rules between x86 and Arm concurrency semantics. First, we propose a concurrency model for Lasagne’s intermediate representation (IR) and formally proved mappings between the IR and the two architectures. The memory ordering is preserved by introducing fences in the translated code. Finally, we propose optimizations focused on raising the level of abstraction of memory address calculations and reducing the number offences. Our evaluation shows that Lasagne reduces the number of fences by up to about 65%, with an average reduction of 45.5%, significantly reducing their runtime overhead.Postprin

TU Delft Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

University of St. Andrews - Pure

St Andrews Research Repository

Overhauling SC atomics in C11 and OpenCL

Author: Alastair F. Donaldson
IEC.
IEC.
IEC.
John Wickerson
Mark Batty
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/10/2015
Field of study

Despite the conceptual simplicity of sequential consistency (SC), the semantics of SC atomic operations and fences in the C11 and OpenCL memory models is subtle, leading to convoluted prose descriptions that translate to complex axiomatic formalisations. We conduct an overhaul of SC atomics in C11, reducing the associated axioms in both number and complexity. A consequence of our simplification is that the SC operations in an execution no longer need to be totally ordered. This relaxation enables, for the first time, efficient and exhaustive simulation of litmus tests that use SC atomics. We extend our improved C11 model to obtain the first rigorous memory model formalisation for OpenCL (which extends C11 with support for heterogeneous many-core programming). In the OpenCL setting, we refine the SC axioms still further to give a sensible semantics to SC operations that employ a ‘memory scope’ to restrict their visibility to specific threads. Our overhaul requires slight strengthenings of both the C11 and the OpenCL memory models, causing some behaviours to become disallowed. We argue that these strengthenings are natural, and that all of the formalised C11 and OpenCL compilation schemes of which we are aware (Power and x86 CPUs for C11, AMD GPUs for OpenCL) remain valid in our revised models. Using the HERD memory model simulator, we show that our overhaul leads to an exponential improvement in simulation time for C11 litmus tests compared with the original model, making exhaustive simulation competitive, time-wise, with the non-exhaustive CDSChecker tool

arXiv.org e-Print Archive

Crossref

Kent Academic Repository

Spiral - Imperial College Digital Repository

Concurrency-aware thread scheduling for high-level synthesis

Author: Constantinides G
Ramanathan N
Wickerson JP
Publication venue
Publication date: 05/03/2018
Field of study

When mapping C programs to hardware, high-level synthesis (HLS) tools seek to reorder instructions so they can be packed into as few clock cycles as possible. However, when synthesising multi-threaded C, instruction reordering is inhibited by the presence of atomic operations (‘atomics’), such as compare- and-swap. Atomics, the fundamental concurrency primitive in C, are the basis of more abstract concurrency mechanisms such as locks, and also of efficient lock-free data structures. Whether a particular atomic can be legally reordered within a thread can depend on the memory access patterns of other threads. Existing HLS tools that support atomics typically sched- ule each thread independently, and so must be conservative when optimising around atomics. Yet HLS tools are distinguished from conventional compilers by having the entire program available. Can this information be exploited to allow more reorderings within each thread, and hence to obtain more efficient schedules? In this work, we propose a global analysis that determines, for each thread, which pairs of instructions must not be reordered. Our analysis is sensitive to the C consistency mode of the atomics involved (e.g. relaxed, release, acquire, and sequentially- consistent). We have used the Alloy model checker to validate our analysis against the C language standard, and have implemented it in the LegUp HLS tool. An evaluation on several lock-free data structure benchmarks indicates that our analysis leads to a 1.6 × average global speedup

Crossref

Spiral - Imperial College Digital Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Automatically Comparing Memory Consistency Models

Author: Abrial Jean-Raymond
Adve Sarita V.
Aho Alfred V.
Alglave Jade
Audemard Gilles
Biere Armin
Bornholt James
Collier William W.
Eén Niklas
Gharachorloo Kourosh
Higham Lisa
IEC.
Jackson Daniel
John
Lahav Ori
Möhring Rolf H.
Sezgin Ali
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/10/2016
Field of study

A memory consistency model (MCM) is the part of a programming language or computer architecture specification that defines which values can legally be read from shared memory locations. Because MCMs take into account various optimisations employed by archi- tectures and compilers, they are often complex and counterintu- itive, which makes them challenging to design and to understand. We identify four tasks involved in designing and understanding MCMs: generating conformance tests, distinguishing two MCMs, checking compiler optimisations, and checking compiler mappings. We show that all four tasks are instances of a general constraint-satisfaction problem to which the solution is either a program or a pair of programs. Although this problem is intractable for automatic solvers when phrased over programs directly, we show how to solve analogous constraints over program executions, and then construct programs that satisfy the original constraints. Our technique, which is implemented in the Alloy modelling framework, is illustrated on several software- and architecture-level MCMs, both axiomatically and operationally defined. We automatically recreate several known results, often in a simpler form, including: distinctions between variants of the C11 MCM; a failure of the ‘SC-DRF guarantee’ in an early C11 draft; that x86 is ‘multi-copy atomic’ and Power is not; bugs in common C11 compiler optimisations; and bugs in a compiler mapping from OpenCL to AMD-style GPUs. We also use our technique to develop and validate a new MCM for NVIDIA GPUs that supports a natural mapping from OpenCL

Crossref

Kent Academic Repository

Spiral - Imperial College Digital Repository

Locality and Singularity for Store-Atomic Memory Models

Author: A Bouajjani
A Bouajjani
A Bouajjani
A Dan
D Shasha
E Derevenetc
J Alglave
J Alglave
J Alglave
J Alglave
J Alglave
J Alglave
J Burnim
J Nieplocha
L Lamport
L Lamport
M Kuperstein
MF Atig
MF Atig
P Sewell
PA Abdulla
PA Abdulla
R Machado
S Burckhardt
S Burckhardt
V Vafeiadis
Y Meshman
Publication venue
Publication date: 14/03/2017
Field of study

Robustness is a correctness notion for concurrent programs running under relaxed consistency models. The task is to check that the relaxed behavior coincides (up to traces) with sequential consistency (SC). Although computationally simple on paper (robustness has been shown to be PSPACE-complete for TSO, PGAS, and Power), building a practical robustness checker remains a challenge. The problem is that the various relaxations lead to a dramatic number of computations, only few of which violate robustness. In the present paper, we set out to reduce the search space for robustness checkers. We focus on store-atomic consistency models and establish two completeness results. The first result, called locality, states that a non-robust program always contains a violating computation where only one thread delays commands. The second result, called singularity, is even stronger but restricted to programs without lightweight fences. It states that there is a violating computation where a single store is delayed. As an application of the results, we derive a linear-size source-to-source translation of robustness to SC-reachability. It applies to general programs, regardless of the data domain and potentially with an unbounded number of threads and with unbounded buffers. We have implemented the translation and verified, for the first time, PGAS algorithms in a fully automated fashion. For TSO, our analysis outperforms existing tools

arXiv.org e-Print Archive

Crossref

Fraunhofer-ePrints