101 research outputs found

    Efficient Inter-Task Communication for Nested Loop Programs on a Multiprocessor System

    Get PDF
    In modern multiprocessor systems, processors can be stalled by inter-task communication when reading from a remote buffer. This paper presents a solution for the inter-task communication, that has a minimal impact on the performance of the system, hides the inter-task communication latency without requiring additional hardware. The solution applies to jobs, represented as task graphs, where the tasks are nested loop programs. Buffers are allocated in scratch-pad memories of the consuming tasks to provide low latency read access. For the nested loop programs, minimal buffer sizes can be determined to cover all possible communication patterns. The added computational complexity is low, as the solution adds only a few operations to the nested loop programs

    Formal Modelling, Testing and Verification of HSA Memory Models using Event-B

    Full text link
    The HSA Foundation has produced the HSA Platform System Architecture Specification that goes a long way towards addressing the need for a clear and consistent method for specifying weakly consistent memory. HSA is specified in a natural language which makes it open to multiple ambiguous interpretations and could render bugs in implementations of it in hardware and software. In this paper we present a formal model of HSA which can be used in the development and verification of both concurrent software applications as well as in the development and verification of the HSA-compliant platform itself. We use the Event-B language to build a provably correct hierarchy of models from the most abstract to a detailed refinement of HSA close to implementation level. Our memory models are general in that they represent an arbitrary number of masters, programs and instruction interleavings. We reason about such general models using refinements. Using Rodin tool we are able to model and verify an entire hierarchy of models using proofs to establish that each refinement is correct. We define an automated validation method that allows us to test baseline compliance of the model against a suite of published HSA litmus tests. Once we complete model validation we develop a coverage driven method to extract a richer set of tests from the Event-B model and a user specified coverage model. These tests are used for extensive regression testing of hardware and software systems. Our method of refinement based formal modelling, baseline compliance testing of the model and coverage driven test extraction using the single language of Event-B is a new way to address a key challenge facing the design and verification of multi-core systems.Comment: 9 pages, 10 figure

    Towards compliant distributed shared memory

    Get PDF
    Copyright © 2002 IEEEThere exists a wide spectrum of coherency models for use in distributed shared memory (DSM) systems. The choice of model for an application should ideally be based on the application's data access patterns and phase changes. However, in current systems, most, if not all of the parameters of the coherency model are fixed in the underlying DSM system. This forces the application either to structure its computations to suit the underlying model or to endure an inefficient coherency model. This paper introduces a unique approach to the provision of DSM based on the idea of compliance. Compliance allows an application to specify how the system should most effectively operate through a separation between mechanism, provided by the underlying system, and policy, pro-vided by the application. This is in direct contrast with the traditional view that an application must mold itself to the hard-wired choices that its operating platform has made. The contribution of this work is the definition and implementation of an architecture for compliant distributed coherency management. The efficacy of this architecture is illustrated through a worked example.Falkner, K. E.; Detmold, H.; Munro, D. S.; Olds, T

    A simple modern correctness condition for a space-based high-performance multiprocessor

    Get PDF
    A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system

    Implicit transactional memory in chip multiprocessors

    Get PDF
    Chip Multiprocessors (CMPs) are an efficient way of designing and use the huge amount of transistors on a chip. Different cores on a chip can compose a shared memory system with a very low-latency interconnect at a very low cost. Unfortunately, consistency models and synchronization styles of popular programming models for multiprocessors impose severe performance losses. Known architectural approaches to combat these losses are too complex, too specialized, or not transparent to the software. In this article, we introduce “implicit transactional memory” as a generalized architectural concept to remove such performance losses. We show how the concept of implicit transactions can be implemented at a low complexity by leveraging the multi-checkpoint mechanism of the Kilo-Instruction Processor. By relying on a general speculation substrate, it supports even the strictest consistency model – sequential consistency – potentially as effectively as weaker models and it allows multiple threads to speculatively execute critical sections, beyond barriers and event synchronizations.Postprint (published version

    How to Make a Correct Multiprocess Program Execute Correctly on a Multiprocessor

    Get PDF
    Abstract A multiprocess program executing on a modern multiprocessor must issue explicit commands to synchronize memory accesses. A method is proposed for deriving the necessary commands from a correctness proof of the underlying algorithm in a formalism based on temporal relations among operation executions
    corecore