5 research outputs found

    Speculative Data Distribution in Shared Memory Multiprocessors

    Get PDF
    This work explores the possibility of using speculation at the directories in a cache coherent non-uniform memory access multiprocessor architecture to improve performance by forwarding data to their destinations before requests are sent. It improves on previous consumer prediction techniques, showing how to construct a predictor that can handle a tradeoff of accuracy and coverage. This dissertation then explores the correct time to perform consumer prediction, and show how a directory protocol can incorporate such a scheme. The consumer prediction enhanced protocol that is developed is able to reduce the runtime of a set of scientific benchmarks by 10%-20%, without substantially reducing the runtime of other benchmarks; specifically, those benchmarks feature simple phased behavior and regularly distribute data to more than two processors. This work then explores the interaction of consumer prediction with two other forms of prediction, migratory prediction and last touch prediction. It demonstrates a mechanism by which migratory prediction can be implemented using only the storage elements already present in a consumer predictor. By combining this migratory predictor with a consumer predictor, it is possible to produce greater speedups than did either individually. Finally, the signatures of the last touch predictor can be applied to improve the performance of consumer prediction

    Integrated shared-memory and message-passing communication in the Alewife multiprocessor

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 237-246) and index.by John David Kubiatowicz.Ph.D

    Kiloprocessor Extensions to SCI

    No full text
    To expand the Scalable Coherent Interface's (SCI) capabilities so it can be used to efficiently handle sharing in systems of hundreds or even thousands of processors, the SCI working group is developing the Kiloprocessor Extensions to SCI. In this paper we describe the proposed GLOW and STEM kiloprocessor extensions to SCI. These two sets of extensions provide SCI with scalable reads and scalable writes to widely-shared data. This kind of datum represents one of the main obstacles to scalability for many cache coherence protocols. The GLOW extensions are intended for systems with complex networks of interconnected SCI rings, (e.g., large networks of workstations). GLOW extensions are based on building k-ary sharing trees that map well to the underlying topology. In contrast, STEM is intended for systems where GLOW is not applicable (e.g., topologies based on centralized switches). STEM defines algorithms to build and maintain binary sharing trees. We show that latencies of GLOW reads a..

    Hierarchical Extensions to SCI

    No full text
    The cache coherence scheme of the Scalable Coherent Interface (SCI) offers performance for some operations degrading linearly with degree of sharing. For large scale sharing, we need schemes that offer logarithmic time reading and writing of shared data for acceptable performance. Therefore, we need to move from SCI's sharing lists to sharing trees. The currently proposed Kiloprocessor Extensions to SCI define tree protocols that create, maintain and invalidate binary trees without taking into account the underlying topology. However these protocols are quite complex. We propose a different approach to Kiloprocessor Extensions to SCI. We define k-ary trees that are well mapped on the topology of a system. In this way the k-ary sharing trees offer great geographical locality (neighbors in the tree are also physical neighbors). The resulting protocols are simple and in some cases their performance for reading and writing shared data is superior to the previous protocols. We present our p..
    corecore