13 research outputs found

    Rapid designs for cache coherence protocol engines in Bluespec

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 91-92).In this thesis, we present the framework for Rapid Protocol Engine Development (RaPED). We implemented the framework in Bluespec, which is a high level hardware language based on Term Rewriting Systems (TRSs). The framework is highly parameterized and general, thus allowing designers to design any protocol engine in a short period. Since protocol engines can be developed rapidly, designers can compare different designs instead of freezing the design prematurely in the development process. We used the RaPED to implement a cache coherence protocol for Shen and Arvind's Commit-Reconcile and Fences (CRF) memory model [1]. The CRF allows scalable implementations of shared memory systems by decomposing memory access operations into simpler instructions. However, the focus for Shen's Cachet protocol for the CRF was adaptivity and correctness, it ignored some important implementation issues such as cache-line replacement, efficient buffer management and compatibility with multiword cache lines. In this thesis, we present a protocol called the Multiword Base protocol, which avoids these limitations. We defined the Multi-word CRF (MCRF) memory model to help us to prove the correctness of Multiword Base. The MCRF is a specialization of the CRF with modifications that summarizes the properties of multiword cache lines. We show that Multiword Base is a correct implementation of the CRF by using the MCRF to simulate Multiword Base. Apart from using multiword cache lines, many cache coherence protocols allow a cache to get data directly from another cache. The caches having this property is calling the snoopy caches. In this thesis, we present a CRF variant called the Snoopy CRF (SCRF) memory model, which gives hints to incorporate snoopy caches to the implementations of the CRF.by Man Cheuk Ng.S.M

    Abstraction of parallel uniform processes with data

    Get PDF

    Parameterized Verification of Systems with Global Synchronization and Guards

    Get PDF
    Inspired by distributed applications that use consensus or other agreement protocols for global coordination, we define a new computational model for parameterized systems that is based on a general global synchronization primitive and allows for global transition guards. Our model generalizes many existing models in the literature, including broadcast protocols and guarded protocols. We show that reachability properties are decidable for systems without guards, and give sufficient conditions under which they remain decidable in the presence of guards. Furthermore, we investigate cutoffs for reachability properties and provide sufficient conditions for small cutoffs in a number of cases that are inspired by our target applications.Comment: Accepted at CAV 202

    Studying and Analysing Transactional Memory Using Interval Temporal Logic and AnaTempura

    Get PDF
    Transactional memory (TM) is a promising lock-free synchronisation technique which offers a high-level abstract parallel programming model for future chip multiprocessor (CMP) systems. Moreover, it adapts the well-established popular paradigm of transactions and thus provides a general and flexible way to allow programs to read and modify disparate memory locations atomically as a single operation. In this thesis, we propose a general framework for validating a TM design, starting from a formal specification into a hardware implementation, with its underpinning theory and refinement. A methodology in this work starts with a high-level and executable specification model for an abstract TM with verification for various correctness conditions of concurrent transactions. This model is constructed within a flexible transition framework that allows verifying correctness of a TM system with animation. Then, we present a formal executable specification for a chip-dual single-cycle MIPS processor with a cache coherence protocol and integrate the provable TM system. Finally, we transform the dual processors with the TM from a high-level description into a Hardware Description Language (VHDL), using some proposed refinement and restriction rules. Interval Temporal Logic (ITL) and its programming language subset AnaTempura are used to build, execute and test the model, since they together provide a powerful framework supporting logical reasoning about time intervals as well as programming and simulation

    CROSS-LAYER CUSTOMIZATION FOR LOW POWER AND HIGH PERFORMANCE EMBEDDED MULTI-CORE PROCESSORS

    Get PDF
    Due to physical limitations and design difficulties, computer processor architecture has shifted to multi-core and even many-core based approaches in recent years. Such architectures provide potentials for sustainable performance scaling into future peta-scale/exa-scale computing platforms, at affordable power budget, design complexity, and verification efforts. To date, multi-core processor products have been replacing uni-core processors in almost every market segment, including embedded systems, general-purpose desktops and laptops, and super computers. However, many issues still remain with multi-core processor architectures that need to be addressed before their potentials could be fully realized. People in both academia and industry research community are still seeking proper ways to make efficient and effective use of these processors. The issues involve hardware architecture trade-offs, the system software service, the run-time management, and user application design, which demand more research effort into this field. Due to the architectural specialties with multi-core based computers, a Cross-Layer Customization framework is proposed in this work, which combines application specific information and system platform features, along with necessary operating system service support, to achieve exceptional power and performance efficiency for targeted multi-core platforms. Several topics are covered with specific optimization goals, including snoop cache coherence protocol, inter-core communication for producer-consumer applications, synchronization mechanisms, and off-chip memory bandwidth limitations. Analysis of benchmark program execution with conventional mechanisms is made to reveal the overheads in terms of power and performance. Specific customizations are proposed to eliminate such overheads with support from hardware, system software, compiler, and user applications. Experiments show significant improvement on system performance and power efficiency

    Optimizations and Cost Models for multi-core architectures: an approach based on parallel paradigms

    Get PDF
    The trend in modern microprocessor architectures is clear: multi-core chips are here to stay, and researchers expect multiprocessors with 128 to 1024 cores on a chip in some years. Yet the software community is slowly taking the path towards parallel programming: while some works target multi-cores, these are usually inherited from the previous tools for SMP architectures, and rarely exploit specific characteristics of multi-cores. But most important, current tools have no facilities to guarantee performance or portability among architectures. Our research group was one of the first to propose the structured parallel programming approach to solve the problem of performance portability and predictability. This has been successfully demonstrated years ago for distributed and shared memory multiprocessors, and we strongly believe that the same should be applied to multi-core architectures. The main problem with performance portability is that optimizations are effective only under specific conditions, making them dependent on both the specific program and the target architecture. For this reason in current parallel programming (in general, but especially with multi-cores) optimizations usually follows a try-and-decide approach: each one must be implemented and tested on the specific parallel program to understand its benefits. If we want to make a step forward and really achieve some form of performance portability, we require some kind of prediction of the expected performance of a program. The concept of performance modeling is quite old in the world of parallel programming; yet, in the last years, this kind of research saw small improvements: cost models to describe multi-cores are missing, mainly because of the increasing complexity of microarchitectures and the poor knowledge of specific implementation details of current processors. In the first part of this thesis we prove that the way of performance modeling is still feasible, by studying the Tilera TilePro64. The high number of cores on-chip in this processor (64) required the use of several innovative solutions, such as a complex interconnection network and the use of multiple memory interfaces per chip. For these features the TilePro64 can be considered an insight of what to expect in future multi-core processors. The availability of a cycle-accurate simulator and an extensive documentation allowed us to model the architecture, and in particular its memory subsystem, at the accuracy level required to compare optimizations In the second part, focused on optimizations, we cover one of the most important issue of multi-core architectures: the memory subsystem. In this area multi-core strongly differs in their structure w.r.t off-chip parallel architectures, both SMP and NUMA, thus opening new opportunities. In detail, we investigate the problem of data distribution over the memory controllers in several commercial multi-cores, and the efficient use of the cache coherency mechanisms offered by the TilePro64 processor. Finally, by using the performance model, we study different implementations, derived from the previous optimizations, of a simple test-case application. We are able to predict the best version using only profiled data from a sequential execution. The accuracy of the model has been verified by experimentally comparing the implementations on the real architecture, giving results within 1 − 2% of accuracy

    Algorithmic Analysis of Infinite-State Systems

    Get PDF
    Many important software systems, including communication protocols and concurrent and distributed algorithms generate infinite state-spaces. Model-checking which is the most prominent algorithmic technique for the verification of concurrent systems is restricted to the analysis of finite-state models. Algorithmic analysis of infinite-state models is complicated--most interesting properties are undecidable for sufficiently expressive classes of infinite-state models. In this thesis, we focus on the development of algorithmic analysis techniques for two important classes of infinite-state models: FIFO Systems and Parameterized Systems. FIFO systems consisting of a set of finite-state machines that communicate via unbounded, perfect, FIFO channels arise naturally in the analysis of distributed protocols. We study the problem of computing the set of reachable states of a FIFO system composed of piecewise components. This problem is closely related to calculating the set of all possible channel contents, i.e. the limit language. We present new algorithms for calculating the limit language of a system with a single communication channel and important subclasses of multi-channel systems. We also discuss the complexity of these algorithms. Furthermore, we present a procedure that translates a piecewise FIFO system to an abridged structure, representing an expressive abstraction of the system. We show that we can analyze the infinite computations of the more concrete model by analyzing the computations of the finite, abridged model. Parameterized systems are a common model of computation for concurrent systems consisting of an arbitrary number of homogenous processes. We study the reachability problem in parameterized systems of infinite-state processes. We describe a framework that combines Abstract Interpretation with a backward-reachability algorithm. Our key idea is to create an abstract domain in which each element (a) represents the lower bound on the number of processes at a control location and (b) employs a numeric abstract domain to capture arithmetic relations among variables of the processes. We also provide an extrapolation operator for the domain to guarantee sound termination of the backward-reachability algorithm

    Modal Abstraction and Replication of Processes with Data

    Get PDF
    Fokkink, W.J. [Promotor]Pol, J.C. van de [Copromotor

    Fourth NASA Langley Formal Methods Workshop

    Get PDF
    This publication consists of papers presented at NASA Langley Research Center's fourth workshop on the application of formal methods to the design and verification of life-critical systems. Topic considered include: Proving properties of accident; modeling and validating SAFER in VDM-SL; requirement analysis of real-time control systems using PVS; a tabular language for system design; automated deductive verification of parallel systems. Also included is a fundamental hardware design in PVS
    corecore