12,621 research outputs found

    Locality and Singularity for Store-Atomic Memory Models

    Full text link
    Robustness is a correctness notion for concurrent programs running under relaxed consistency models. The task is to check that the relaxed behavior coincides (up to traces) with sequential consistency (SC). Although computationally simple on paper (robustness has been shown to be PSPACE-complete for TSO, PGAS, and Power), building a practical robustness checker remains a challenge. The problem is that the various relaxations lead to a dramatic number of computations, only few of which violate robustness. In the present paper, we set out to reduce the search space for robustness checkers. We focus on store-atomic consistency models and establish two completeness results. The first result, called locality, states that a non-robust program always contains a violating computation where only one thread delays commands. The second result, called singularity, is even stronger but restricted to programs without lightweight fences. It states that there is a violating computation where a single store is delayed. As an application of the results, we derive a linear-size source-to-source translation of robustness to SC-reachability. It applies to general programs, regardless of the data domain and potentially with an unbounded number of threads and with unbounded buffers. We have implemented the translation and verified, for the first time, PGAS algorithms in a fully automated fashion. For TSO, our analysis outperforms existing tools

    A Theory of Partitioned Global Address Spaces

    Get PDF
    Partitioned global address space (PGAS) is a parallel programming model for the development of applications on clusters. It provides a global address space partitioned among the cluster nodes, and is supported in programming languages like C, C++, and Fortran by means of APIs. In this paper we provide a formal model for the semantics of single instruction, multiple data programs using PGAS APIs. Our model reflects the main features of popular real-world APIs such as SHMEM, ARMCI, GASNet, GPI, and GASPI. A key feature of PGAS is the support for one-sided communication: a node may directly read and write the memory located at a remote node, without explicit synchronization with the processes running on the remote side. One-sided communication increases performance by decoupling process synchronization from data transfer, but requires the programmer to reason about appropriate synchronizations between reads and writes. As a second contribution, we propose and investigate robustness, a criterion for correct synchronization of PGAS programs. Robustness corresponds to acyclicity of a suitable happens-before relation defined on PGAS computations. The requirement is finer than the classical data race freedom and rules out most false error reports. Our main result is an algorithm for checking robustness of PGAS programs. The algorithm makes use of two insights. Using combinatorial arguments we first show that, if a PGAS program is not robust, then there are computations in a certain normal form that violate happens-before acyclicity. Intuitively, normal-form computations delay remote accesses in an ordered way. We then devise an algorithm that checks for cyclic normal-form computations. Essentially, the algorithm is an emptiness check for a novel automaton model that accepts normal-form computations in streaming fashion. Altogether, we prove the robustness problem is PSpace-complete

    Robustness against Power is PSPACE-complete

    Full text link
    Power is a RISC architecture developed by IBM, Freescale, and several other companies and implemented in a series of POWER processors. The architecture features a relaxed memory model providing very weak guarantees with respect to the ordering and atomicity of memory accesses. Due to these weaknesses, some programs that are correct under sequential consistency (SC) show undesirable effects when run under Power. We call these programs not robust against the Power memory model. Formally, a program is robust if every computation under Power has the same data and control dependencies as some SC computation. Our contribution is a decision procedure for robustness of concurrent programs against the Power memory model. It is based on three ideas. First, we reformulate robustness in terms of the acyclicity of a happens-before relation. Second, we prove that among the computations with cyclic happens-before relation there is one in a certain normal form. Finally, we reduce the existence of such a normal-form computation to a language emptiness problem. Altogether, this yields a PSPACE algorithm for checking robustness against Power. We complement it by a matching lower bound to show PSPACE-completeness

    Synchronising C/C++ and POWER

    Get PDF
    Shared memory concurrency relies on synchronisation primitives: compare-and-swap, load-reserve/store-conditional (aka LL/SC), language-level mutexes, and so on. In a sequentially consistent setting, or even in the TSO setting of x86 and Sparc, these have well-understood semantics. But in the very relaxed settings of IBM®, POWER®, ARM, or C/C++, it remains surprisingly unclear exactly what the programmer can depend on. This paper studies relaxed-memory synchronisation. On the hardware side, we give a clear semantic characterisation of the load-reserve/store-conditional primitives as provided by POWER multiprocessors, for the first time since they were introduced 20 years ago; we cover their interaction with relaxed loads, stores, barriers, and dependencies. Our model, while not officially sanctioned by the vendor, is validated by extensive testing, comparing actual implementation behaviour against an oracle generated from the model, and by detailed discussion with IBM staff. We believe the ARM semantics to be similar. On the software side, we prove sound a proposed compilation scheme of the C/C++ synchronisation constructs to POWER, including C/C++ spinlock mutexes, fences, and read-modify-write operations, together with the simpler atomic operations for which soundness is already known from our previous work; this is a first step in verifying concurrent algorithms that use load-reserve/store-conditional with respect to a realistic semantics. We also build confidence in the C/C++ model in its own terms, fixing some omissions and contributing to the C standards committee adoption of the C++11 concurrency model

    Generating Litmus Tests for Contrasting Memory Consistency Models - Extended Version

    Get PDF
    Well-defined memory consistency models are necessary for writing correct parallel software. Developing and understanding formal specifications of hardware memory models is a challenge due to the subtle differences in allowed reorderings and different specification styles. To facilitate exploration of memory model specifications, we have developed a technique for systematically comparing hardware memory models specified using both operational and axiomatic styles. Given two specifications, our approach generates all possible multi-threaded programs up to a specified bound, and for each such program, checks if one of the models can lead to an observable behavior not possible in the other model. When the models differs, the tool finds a minimal “litmus test” program that demonstrates the difference. A number of optimizations reduce the number of programs that need to be examined. Our prototype implementation has successfully compared both axiomatic and operational specifications of six different hardware memory models. We describe two case studies: (1) development of a non-store atomic variant of an existing memory model, which illustrates the use of the tool while developing a new memory model, and (2) identification of a subtle specification mistake in a recently published axiomatic specification of TSO

    A Framework for Consistency Algorithms

    Get PDF
    We present a framework that provides deterministic consistency algorithms for given memory models. Such an algorithm checks whether the executions of a shared-memory concurrent program are consistent under the axioms defined by a model. For memory models like SC and TSO, checking consistency is NP-complete. Our framework shows, that despite the hardness, fast deterministic consistency algorithms can be obtained by employing tools from fine-grained complexity. The framework is based on a universal consistency problem which can be instantiated by different memory models. We construct an algorithm for the problem running in time ?^*(2^k), where k is the number of write accesses in the execution that is checked for consistency. Each instance of the framework then admits an ?^*(2^k)-time consistency algorithm. By applying the framework, we obtain corresponding consistency algorithms for SC, TSO, PSO, and RMO. Moreover, we show that the obtained algorithms for SC, TSO, and PSO are optimal in the fine-grained sense: there is no consistency algorithm for these running in time 2^{o(k)} unless the exponential time hypothesis fails

    The Benefits of Duality in Verifying Concurrent Programs under TSO

    Get PDF
    We address the problem of verifying safety properties of concurrent programs running over the TSO memory model. Known decision procedures for this model are based on complex encodings of store buffers as lossy channels. These procedures assume that the number of processes is fixed. However, it is important in general to prove correctness of a system/algorithm in a parametric way with an arbitrarily large number of processes. In this paper, we introduce an alternative (yet equivalent) semantics to the classical one for the TSO model that is more amenable for efficient algorithmic verification and for extension to parametric verification. For that, we adopt a dual view where load buffers are used instead of store buffers. The flow of information is now from the memory to load buffers. We show that this new semantics allows (1) to simplify drastically the safety analysis under TSO, (2) to obtain a spectacular gain in efficiency and scalability compared to existing procedures, and (3) to extend easily the decision procedure to the parametric case, which allows to obtain a new decidability result, and more importantly, a verification algorithm that is more general and more efficient in practice than the one for bounded instances
    • …
    corecore