597 research outputs found
Property-Driven Fence Insertion using Reorder Bounded Model Checking
Modern architectures provide weaker memory consistency guarantees than
sequential consistency. These weaker guarantees allow programs to exhibit
behaviours where the program statements appear to have executed out of program
order. Fortunately, modern architectures provide memory barriers (fences) to
enforce the program order between a pair of statements if needed. Due to the
intricate semantics of weak memory models, the placement of fences is
challenging even for experienced programmers. Too few fences lead to bugs
whereas overuse of fences results in performance degradation. This motivates
automated placement of fences. Tools that restore sequential consistency in the
program may insert more fences than necessary for the program to be correct.
Therefore, we propose a property-driven technique that introduces
"reorder-bounded exploration" to identify the smallest number of program
locations for fence placement. We implemented our technique on top of CBMC;
however, in principle, our technique is generic enough to be used with any
model checker. Our experimental results show that our technique is faster and
solves more instances of relevant benchmarks as compared to earlier approaches.Comment: 18 pages, 3 figures, 4 algorithms. Version change reason : new set of
results and publication ready version of FM 201
TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA
Memory consistency models (MCMs) which govern inter-module interactions in a
shared memory system, are a significant, yet often under-appreciated, aspect of
system design. MCMs are defined at the various layers of the hardware-software
stack, requiring thoroughly verified specifications, compilers, and
implementations at the interfaces between layers. Current verification
techniques evaluate segments of the system stack in isolation, such as proving
compiler mappings from a high-level language (HLL) to an ISA or proving
validity of a microarchitectural implementation of an ISA.
This paper makes a case for full-stack MCM verification and provides a
toolflow, TriCheck, capable of verifying that the HLL, compiler, ISA, and
implementation collectively uphold MCM requirements. The work showcases
TriCheck's ability to evaluate a proposed ISA MCM in order to ensure that each
layer and each mapping is correct and complete. Specifically, we apply TriCheck
to the open source RISC-V ISA, seeking to verify accurate, efficient, and legal
compilations from C11. We uncover under-specifications and potential
inefficiencies in the current RISC-V ISA documentation and identify possible
solutions for each. As an example, we find that a RISC-V-compliant
microarchitecture allows 144 outcomes forbidden by C11 to be observed out of
1,701 litmus tests examined. Overall, this paper demonstrates the necessity of
full-stack verification for detecting MCM-related bugs in the hardware-software
stack.Comment: Proceedings of the Twenty-Second International Conference on
Architectural Support for Programming Languages and Operating System
Locality and Singularity for Store-Atomic Memory Models
Robustness is a correctness notion for concurrent programs running under
relaxed consistency models. The task is to check that the relaxed behavior
coincides (up to traces) with sequential consistency (SC). Although
computationally simple on paper (robustness has been shown to be
PSPACE-complete for TSO, PGAS, and Power), building a practical robustness
checker remains a challenge. The problem is that the various relaxations lead
to a dramatic number of computations, only few of which violate robustness.
In the present paper, we set out to reduce the search space for robustness
checkers. We focus on store-atomic consistency models and establish two
completeness results. The first result, called locality, states that a
non-robust program always contains a violating computation where only one
thread delays commands. The second result, called singularity, is even stronger
but restricted to programs without lightweight fences. It states that there is
a violating computation where a single store is delayed.
As an application of the results, we derive a linear-size source-to-source
translation of robustness to SC-reachability. It applies to general programs,
regardless of the data domain and potentially with an unbounded number of
threads and with unbounded buffers. We have implemented the translation and
verified, for the first time, PGAS algorithms in a fully automated fashion. For
TSO, our analysis outperforms existing tools
Partial Orders for Efficient BMC of Concurrent Software
This version previously deposited at arXiv:1301.1629v1 [cs.LO]The vast number of interleavings that a concurrent program can have is typically identified as the root cause of the difficulty of automatic analysis of concurrent software. Weak memory is generally believed to make this problem even harder. We address both issues by modelling programs' executions with partial orders rather than the interleaving semantics (SC). We implemented a software analysis tool based on these ideas. It scales to programs of sufficient size to achieve first-time formal verification of non-trivial concurrent systems code over a wide range of models, including SC, Intel x86 and IBM Power
The Decidability of Verification under Promising 2.0
In PLDI'20, Lee et al. introduced the \emph{promising } semantics PS 2.0 of
the C++ concurrency that captures most of the common program transformations
while satisfying the DRF guarantee. The reachability problem for finite-state
programs under PS 2.0 with only release-acquire accesses is already known to be
undecidable. Therefore, we address, in this paper, the reachability problem for
programs running under PS 2.0 with relaxed accesses together with promises. We
show that this problem is undecidable even in the case where the input program
has finite state. Given this undecidability result, we consider the fragment of
PS 2.0 with only relaxed accesses allowing bounded number of promises. We show
that under this restriction, the reachability is decidable, albeit very
expensive: it is non-primitive recursive. Given this high complexity with
bounded number of promises and the undecidability result for the RA fragment of
PS 2.0, we consider a bounded version of the reachability problem. To this end,
we bound both the number of promises and the "view-switches", i.e, the number
of times the processes may switch their local views of the global memory. We
provide a code-to-code translation from an input program under PS 2.0, with
relaxed and release-acquire memory accesses along with promises, to a program
under SC. This leads to a reduction of the bounded reachability problem under
PS 2.0 to the bounded context-switching problem under SC. We have implemented a
prototype tool and tested it on a set of benchmarks, demonstrating that many
bugs in programs can be found using a small bound
Software Verification for Weak Memory via Program Transformation
Despite multiprocessors implementing weak memory models, verification methods
often assume Sequential Consistency (SC), thus may miss bugs due to weak
memory. We propose a sound transformation of the program to verify, enabling SC
tools to perform verification w.r.t. weak memory. We present experiments for a
broad variety of models (from x86/TSO to Power/ARM) and a vast range of
verification tools, quantify the additional cost of the transformation and
highlight the cases when we can drastically reduce it. Our benchmarks include
work-queue management code from PostgreSQL
Lazy Sequentialization for TSO and PSO via Shared Memory Abstractions
Lazy sequentialization is one of the most effective approaches for the bounded verification of concurrent programs. Existing tools assume sequential consistency (SC), thus the feasibility of lazy sequentializations for weak memory models (WMMs) remains untested. Here, we describe the first lazy sequentialization approach for the total store order (TSO) and partial store order (PSO) memory models. We replace all shared memory accesses with operations on a shared memory abstraction (SMA), an abstract data type that encapsulates the semantics of the underlying WMM and implements it under the simpler SC model. We give efficient SMA implementations for TSO and PSO that are based on temporal circular doubly-linked lists, a new data structure that allows an efficient simulation of the store buffers. We show experimentally, both on the SV-COMP concurrency benchmarks and a real world instance, that this approach works well in combination with lazy sequentialization on top of bounded model checking
- …