1,160 research outputs found

    The 'Test model-checking' approach to the verification of formal memory models of multiprocessors

    Get PDF
    technical reportWe offer a solution to the problem of verifying formal memory models of processors by combining the strengths of model-checking and a formal testing procedure for parallel machines. We characterize the formal basis for abstracting the tests into test automata and associated memory rule safety properties whose violations pinpoint the ordering rule being violated. Our experimental results on Verilog models of a commercial split transaction bus demonstrates the ability of our method to effectively debug design models during early stages of their development

    Model checking concurrent assembly algorithms

    Get PDF
    Model checking has been used in various domains, to enable automatic verification of properties for a given model. Especially in cases when the correctness of the the model is not evident due to the complex nature of the description, model checking can be an indispensable tool. One such domain is the use of concurrent assembly algorithms for lowlevel synchronisation, which can be notoriously difficult to check their correctness or even test. In this paper we look at this domain, and explore the use of model-checking in verifying a number of such algorithms, such as barrier synchronisation and wait-free CSP channel communication. We tackle the state explosion problem inherent in model checking by making use of abstraction techniques to remove rendundant information in the the model, and partial-order techniques to remove redundant interleavings of actions. Finally, we also investigate the use of structural induction to reason about families of systems of arbitrary size. Making use of symmetry and induction, we verify algorithms with an unbounded number of identical participating tasks.peer-reviewe

    On Verifying Causal Consistency

    Full text link
    Causal consistency is one of the most adopted consistency criteria for distributed implementations of data structures. It ensures that operations are executed at all sites according to their causal precedence. We address the issue of verifying automatically whether the executions of an implementation of a data structure are causally consistent. We consider two problems: (1) checking whether one single execution is causally consistent, which is relevant for developing testing and bug finding algorithms, and (2) verifying whether all the executions of an implementation are causally consistent. We show that the first problem is NP-complete. This holds even for the read-write memory abstraction, which is a building block of many modern distributed systems. Indeed, such systems often store data in key-value stores, which are instances of the read-write memory abstraction. Moreover, we prove that, surprisingly, the second problem is undecidable, and again this holds even for the read-write memory abstraction. However, we show that for the read-write memory abstraction, these negative results can be circumvented if the implementations are data independent, i.e., their behaviors do not depend on the data values that are written or read at each moment, which is a realistic assumption.Comment: extended version of POPL 201

    GPUVerify: A Verifier for GPU Kernels

    Get PDF
    We present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream ker-nel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational se-mantics for GPU programming termed synchronous, delayed visibility (SDV) semantics. The SDV semantics provides a precise definition of barrier divergence in GPU kernels and allows kernel verification to be reduced to analysis of a sequential program, thereby completely avoiding the need to reason about thread interleavings, and allowing existing modular techniques for program verification to be leveraged. We describe an efficient encoding for data race detection and propose a method for automatically inferring loop invari-ants required for verification. We have implemented these techniques as a practical verification tool, GPUVerify, which can be applied directly to OpenCL and CUDA source code. We evaluate GPUVerify with respect to a set of 163 kernels drawn from public and commercial sources. Our evaluation demonstrates that GPUVerify is capable of efficient, auto-matic verification of a large number of real-world kernels

    Caching, crashing & concurrency - verification under adverse conditions

    Get PDF
    The formal development of large-scale software systems is a complex and time-consuming effort. Generally, its main goal is to prove the functional correctness of the resulting system. This goal becomes significantly harder to reach when the verification must be performed under adverse conditions. When aiming for a realistic system, the implementation must be compatible with the “real world”: it must work with existing system interfaces, cope with uncontrollable events such as power cuts, and offer competitive performance by using mechanisms like caching or concurrency. The Flashix project is an example of such a development, in which a fully verified file system for flash memory has been developed. The project is a long-term team effort and resulted in a sequential, functionally correct and crash-safe implementation after its first project phase. This thesis continues the work by performing modular extensions to the file system with performance-oriented mechanisms that mainly involve caching and concurrency, always considering crash-safety. As a first contribution, this thesis presents a modular verification methodology for destructive heap algorithms. The approach simplifies the verification by separating reasoning about specifics of heap implementations, like pointer aliasing, from the reasoning about conceptual correctness arguments. The second contribution of this thesis is a novel correctness criterion for crash-safe, cached, and concurrent file systems. A natural criterion for crash-safety is defined in terms of system histories, matching the behavior of fine-grained caches using complex synchronization mechanisms that reorder operations. The third contribution comprises methods for verifying functional correctness and crash-safety of caching mechanisms and concurrency in file systems. A reference implementation for crash-safe caches of high-level data structures is given, and a strategy for proving crash-safety is demonstrated and applied. A compatible concurrent implementation of the top layer of file systems is presented, using a mechanism for the efficient management of fine-grained file locking, and a concurrent version of garbage collection is realized. Both concurrency extensions are proven to be correct by applying atomicity refinement, a methodology for proving linearizability. Finally, this thesis contributes a new iteration of executable code for the Flashix file system. With the efficiency extensions introduced with this thesis, Flashix covers all performance-oriented concepts of realistic file system implementations and achieves competitiveness with state-of-the-art flash file systems

    Formal Power Analysis of Systems-on-Chip

    Get PDF
    The design methods and languages targeted to modern System-on-Chip designs are facing tremendous pressure of the ever-increasing complexity, power, and speed requirements. To estimate any of these three metrics, there is a trade-off between accuracy and abstraction level of detail in which a system under design is analyzed. The more detailed the description, the more accurate the simulation will be, but, on the other hand, the more time consuming it will be. Moreover, a designer wants to make decisions as early as possible in the design flow to avoid costly design backtracking. To answer the challenges posed upon System-on-chip designs, this thesis introduces a formal, power aware framework, its development methods, and methods to constraint and analyze power consumption of the system under design. This thesis discusses on power analysis of synchronous and asynchronous systems not forgetting the communication aspects of these systems. The presented framework is built upon the Timed Action System formalism, which offer an environment to analyze and constraint the functional and temporal behavior of the system at high abstraction level. Furthermore, due to the complexity of System-on-Chip designs, the possibility to abstract unnecessary implementation details at higher abstraction levels is an essential part of the introduced design framework. With the encapsulation and abstraction techniques incorporated with the procedure based communication allows a designer to use the presented power aware framework in modeling these large scale systems. The introduced techniques also enable one to subdivide the development of communication and computation into own tasks. This property is taken into account in the power analysis part as well. Furthermore, the presented framework is developed in a way that it can be used throughout the design project. In other words, a designer is able to model and analyze systems from an abstract specification down to an implementable specification.Siirretty Doriast

    ORCA: Ordering-free Regions for Consistency and Atomicity

    Get PDF
    Writing correct synchronization is one of the main difficulties of multithreaded programming. Incorrect synchronization causes many subtle concurrency errors such as data races and atomicity violations. Previous work has proposed stronger memory consistency models to rule out certain classes of concurrency bugs. However, these approaches are limited by a program’s original (and possibly incorrect) synchronization. In this work, we provide stronger guarantees than previous memory consistency models by punctuating atomicity only at ordering constructs like barriers, but not at lock operations. We describe the Ordering-free Regions for Consistency and Atomicity (ORCA) system which enforces atomicity at the granularity of ordering-free regions (OFRs). While many atomicity violations occur at finer granularity, in an empirical study of many large multithreaded workloads we find no examples of code that requires atomicity coarser than OFRs. Thus, we believe OFRs are a conservative approximation of the atomicity requirements of many programs. ORCA assists programmers by throwing an exception when OFR atomicity is threatened, and, in exception-free executions, guaranteeing that all OFRs execute atomically. In our evaluation, we show that ORCA automatically prevents real concurrency bugs. A user-study of ORCA demonstrates that synchronizing a program with ORCA is easier than using a data race detector. We evaluate modest hardware support that allows ORCA to run with just 18% slowdown on average over pthreads, with very similar scalability

    Parallel code-specific CPU simulation with dynamic phase convergence modeling for HW/SW co-design

    Get PDF
    While SystemC models provide a promising solution to the complex problem of HW/SW co-design within the system-on-chip paradigm, such requires a detailed annotation of transaction level energy and performance data within the model. While this data can be obtained through source code profiling of an application running on the target processor, accomplishing such when the target CPU hardware is not actively available typically requires time-consuming CPU simulation, which is often too slow to practically consider for large programs. Additionally, while the use of SystemC modeling with TLM 2.0 standard is widely adopted for the SoC modeling, the process of transforming C/C++ code to SystemC code with TLM 2.0 functionality remains non-trivial. Herein we propose an automated framework that: 1. Enables high speed code-specific CPU profiling support for both Sniper and gem5 using parallelized dynamic steady state phase convergence modeling, providing automatic annotation of energy and performance within source code. 2. Provides an automated C to SystemC TLM 2.0 code generation flow that utilizes the back-annotated source code to produce a SystemC module for seamless incorporation into the virtual prototype. Maximum speedups obtained using Sniper and gem5 are 48.76x and 562x respectively, while average results obtained speedups of 31.5x and 323.1x. Sniper results maintain an average accuracy of 0.89% for latency and 0.10% for energy, while gem5 achieves average accuracies of 4.16% and 2.87% for latency and energy respectively

    Investigating Single Precision Floating General Matrix Multiply in Heterogeneous Hardware

    Get PDF
    The fundamental operation of matrix multiplication is ubiquitous across a myriad of disciplines. Yet, the identification of new optimizations for matrix multiplication remains relevant for emerging hardware architectures and heterogeneous systems. Frameworks such as OpenCL enable computation orchestration on existing systems, and its availability using the Intel High Level Synthesis compiler allows users to architect new designs for reconfigurable hardware using C/C++. Using the HARPv2 as a vehicle for exploration, we investigate the utility of several of the most notable matrix multiplication optimizations to better understand the performance portability of OpenCL and the implications for such optimizations on this and future heterogeneous architectures. Our results give targeted insights into the applicability of best practices that were for existing architectures when used on emerging heterogeneous systems
    • …
    corecore