14 research outputs found
TransForm: Formally Specifying Transistency Models and Synthesizing Enhanced Litmus Tests
Memory consistency models (MCMs) specify the legal ordering and visibility of
shared memory accesses in a parallel program. Traditionally, instruction set
architecture (ISA) MCMs assume that relevant program-visible memory ordering
behaviors only result from shared memory interactions that take place between
user-level program instructions. This assumption fails to account for virtual
memory (VM) implementations that may result in additional shared memory
interactions between user-level program instructions and both 1) system-level
operations (e.g., address remappings and translation lookaside buffer
invalidations initiated by system calls) and 2) hardware-level operations
(e.g., hardware page table walks and dirty bit updates) during a user-level
program's execution. These additional shared memory interactions can impact the
observable memory ordering behaviors of user-level programs. Thus, memory
transistency models (MTMs) have been coined as a superset of MCMs to
additionally articulate VM-aware consistency rules. However, no prior work has
enabled formal MTM specifications, nor methods to support their automated
analysis.
To fill the above gap, this paper presents the TransForm framework. First,
TransForm features an axiomatic vocabulary for formally specifying MTMs.
Second, TransForm includes a synthesis engine to support the automated
generation of litmus tests enhanced with MTM features (i.e., enhanced litmus
tests, or ELTs) when supplied with a TransForm MTM specification. As a case
study, we formally define an estimated MTM for Intel x86 processors, called
x86t_elt, that is based on observations made by an ELT-based evaluation of an
Intel x86 MTM implementation from prior work and available public
documentation. Given x86t_elt and a synthesis bound as input, TransForm's
synthesis engine successfully produces a set of ELTs including relevant ELTs
from prior work.Comment: *This is an updated version of the TransForm paper that features
updated results reflecting performance optimizations and software bug fixes.
14 pages, 11 figures, Proceedings of the 47th Annual International Symposium
on Computer Architecture (ISCA
Recommended from our members
Weakening WebAssembly
WebAssembly (Wasm) is a safe, portable virtual instruction set that can be hosted in a wide range of environments, such as a Web browser. It is a low-level language whose instructions are intended to compile directly to
bare hardware. While the initial version of Wasm focussed on single-threaded computation, a recent proposal extends it with low-level support for multiple threads and atomic instructions for synchronised access to
shared memory. To support the correct compilation of concurrent programs, it is necessary to give a suitable specification of its memory model.
Wasm’s language definition is based on a fully formalised specification that carefully avoids undefined behaviour. We present a substantial extension to this semantics, incorporating a relaxed memory model, along
with a few proposed operational extensions. Wasm’s memory model is unique in that its linear address space can be dynamically grown during execution, while all accesses are bounds-checked. This leads to the novel
problem of specifying how observations about the size of the memory can propagate between threads. We argue that, considering desirable compilation schemes, we cannot give a sequentially consistent semantics to memory growth.
We show that our model guarantees Sequential Consistency of Data-Race-Free programs (SC-DRF). However, because Wasm is to run on the Web, we must also consider interoperability of its model with that of JavaScript.
We show, by counter-example, that JavaScript’s memory model is not SC-DRF, in contrast to what is claimed in its specification. We propose two axiomatic conditions that should be added to the JavaScript model to
correct this difference.
We also describe a prototype SMT-based litmus tool which acts as an oracle for our axiomatic model, visualising its behaviours, including memory resizing
A Unified, Machine-Checked Formalisation of Java and the Java Memory Model
We present a machine-checked formalisation of the Java memory model and connect it to an operational semantics for Java source code and bytecode. This provides the link between sequential semantics and the memory model that has been missing in the literature. Our model extends previous formalisations by dynamic memory allocation, thread spawns and joins, infinite executions, the wait-notify mechanism and thread interruption. We prove the Java data race freedom guarantee for the complete formalisation in a modular way. This work makes the assumptions about the sequential semantics explicit and shows how to discharge them
The Leaky Semicolon
Program logics and semantics tell a pleasant story about sequential composition: when executing (S1;S2), we first execute S1 then S2. To improve performance, however, processors execute instructions out of order, and compilers reorder programs even more dramatically. By design, single-threaded systems cannot observe these reorderings; however, multiple-threaded systems can, making the story considerably less pleasant. A formal attempt to understand the resulting mess is known as a “relaxed memory model.” Prior models either fail to address sequential composition directly, or overly restrict processors and compilers, or permit nonsense thin-air behaviors which are unobservable in practice.
To support sequential composition while targeting modern hardware, we enrich the standard event-based approach with preconditions and families of predicate transformers. When calculating the meaning of (S1; S2), the predicate transformer applied to the precondition of an event e from S2 is chosen based on the set of events in S1 upon which e depends. We apply this approach to two existing memory models
A Safety-First Approach to Memory Models.
Sequential consistency (SC) is arguably the most intuitive behavior for a shared-memory multithreaded program. It is widely accepted that language-level SC could significantly improve programmability of a multiprocessor system. However, efficiently supporting end-to-end SC remains a challenge as it requires that both compiler and hardware optimizations preserve SC semantics.
Current concurrent languages support a relaxed memory model that requires programmers to explicitly annotate all memory accesses that can participate in a data-race ("unsafe" accesses). This requirement allows compiler and hardware to aggressively optimize unannotated accesses, which are assumed to be data-race-free ("safe" accesses), while still preserving SC semantics. However, unannotated data races are easy for programmers to accidentally introduce and are difficult to detect, and in such cases the safety and correctness of programs are significantly compromised.
This dissertation argues instead for a safety-first approach, whereby every memory operation is treated as potentially unsafe by the compiler and hardware unless it is proven otherwise.
The first solution, DRFx memory model, allows many common compiler and hardware optimizations (potentially SC-violating) on unsafe accesses and uses a runtime support to detect potential SC violations arising from reordering of unsafe accesses. On detecting a potential SC violation, execution is halted before the safety property is compromised.
The second solution takes a different approach and preserves SC in both compiler and hardware. Both SC-preserving compiler and hardware are also built on the safety-first approach. All memory accesses are treated as potentially unsafe by the compiler and hardware. SC-preserving hardware relies on different static and dynamic techniques to identify safe accesses. Our results indicate that supporting SC at the language level is not expensive in terms of performance and hardware complexity.
The dissertation also explores an extension of this safety-first approach for data-parallel accelerators such as Graphics Processing Units (GPUs). Significant microarchitectural differences between CPU and GPU require rethinking of efficient solutions for preserving SC in GPUs. The proposed solution based on our SC-preserving approach performs nearly on par with the baseline GPU that implements a data-race-free-0 memory model.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120794/1/ansingh_1.pd