43 research outputs found

    Type Inference for Deadlock Detection in a Multithreaded Polymorphic Typed Assembly Language

    Full text link
    We previously developed a polymorphic type system and a type checker for a multithreaded lock-based polymorphic typed assembly language (MIL) that ensures that well-typed programs do not encounter race conditions. This paper extends such work by taking into consideration deadlocks. The extended type system verifies that locks are acquired in the proper order. Towards this end we require a language with annotations that specify the locking order. Rather than asking the programmer (or the compiler's backend) to specifically annotate each newly introduced lock, we present an algorithm to infer the annotations. The result is a type checker whose input language is non-decorated as before, but that further checks that programs are exempt from deadlocks

    High Performance Software Coherence for Current and Future Architectures

    No full text
    Shared memory provides an attractive and intuitive programming model for large-scale parallel computing, but requires a coherence mechanism to allow caching for performance while ensuring that processors do not use stale data in their computation. Implementation options range from distributed shared memory emulations on networks of workstations to tightly-coupled fully cachecoherent distributed shared memory multiprocessors. Previous work indicates that performance varies dramatically from one end of this spectrum to the other. Hardware cache coherence is fast, but also costly and time-consuming to design and implement, while DSM systems provide acceptable performance on only a limited class of applications. We claim that an intermediate hardware option---memory-mappednetwork interfaces that support a global physical address space, without cache coherence---can provide most of the performance benefits of fully cache-coherent hardware, at a fraction of the cost. To support this claim we..

    The Mercury User's Manual

    No full text
    It is common for parallel applications to require a large number of threads of control, often much larger than the number of processors provided by the underlying hardware. Using heavyweight (Unix style) processes to implement those threads of control is prohibitively expensive. Mercury is an environment for writing object-oriented parallel programs in C++ that provides the user with simple primitives for inexpensive thread creation and blocking and spinning synchronization. If required, Mercury primitives allow the user to control scheduling decisions in order to achieve good locality of reference in non uniform memory access (NUMA) multiprocessors. This paper describes the basic Mercury primitives and provides examples of their use

    Algorithms for Categorizing Multiprocessor Communication under Invalidate and Update-Based Coherence Protocols

    No full text
    In this paper we present simulation algorithms that characterize the main sources of communication generated by parallel applications under both invalidate and update-based cache coherence protocols. The algorithms provide insight into the reference and sharing patterns of parallel programs and into the amount of useless traffic entailed by each coherence protocol. Under an invalidate-based protocol, our algorithms classify the data traffic caused by the different types of cache misses. Under an update-based protocol, our algorithms not only categorize the data traffic, but also classify update transactions with respect to the sharing patterns that caused them. Although our algorithms deal with numerous hardware features, our categorization is widely applicable and can be easily simplified for use in less detailed simulators

    Memory Models

    No full text
    this memory usually consists of highly-interleaved SRAM, and is a major---perhaps the dominant---component in the cost of these machines. Even so, supercomputer compilers must employ aggressive prefetching techniques, and supercomputer processors must be prepared to execute instructions out of order, to hide the latency of memory. Current hardware and software trends suggest that caches are likely to become more effective for future supercomputer workloads. Hardware trends include the development of very large caches with multiple banks, which address the bandwidth problem. Software System Node System Node System Node processor processor processor processor processor processor memory memory memory Interconnect memory memory memory Interconnect Distributed Memory Architecture Dance-hall Memory Architecture Figure 0.2: Simplified Distributed and Dance-Hall Memory Architecture Multiprocessors trends include the development of compilers that apply techniques such as blocking [7] to increase locality of reference. Independent of the existence of caches, designers must address the question of where to locate main memory. They can choose to co-locate a memory module with each processor or group of processors, or to place all memory modules at one end of an interconnection network and all processors at the other. The first alternative is known as a distributed memory architecture; the second is known as a dance-hall architecture

    Issues in Software Cache Coherence

    No full text
    Large scale multiprocessors can provide the computational power needed to solve some of the larger problems of science and engineering today. Shared memory provides an attractive and intuitive programming model that makes good use of programmer time and effort. Shared memory however requires a coherence mechanism to allow caching for performance and to ensure that processors do not use stale data in their computation. Directory-based coherence, which is the hardware mechanism of choice for large scale multiprocessors, can be expensive both in terms of hardware cost and in terms of the intellectual effort needed to design a correct, efficient protocol. For scalable multiprocessor designs with network-based interconnects, software-based coherence schemes provide an attractive alternative. In this paper we evaluate a new adaptive software coherence protocol, and demonstrate that smart software coherence protocols can be competitive with hardware-based coherence for a large variety of programs. We then discuss issues that affect the performance of software coherence protocols and proceed to suggest algorithmic and architectural enhancements that can help improve software coherence performance.
    corecore