4 research outputs found

    Compiler-directed Shared-Memory Communication for Iterative Parallel Applications

    No full text
    Many scientific applications are iterative and specify repetitive communication patterns. This paper shows how a parallel-language compiler and a predictive cache-coherence protocol in a distributed shared memory system together can implement shared-memory communication efficiently for applications with unpredictable but repetitive communication patterns. The compiler uses static analysis to identify program points where potentially repetitive communication occurs. At runtime, the protocol builds a communication schedule in one iteration and uses the schedule to pre-send data in subsequent iterations. This paper contains measurements of three iterative applications (including adaptive programs with unstructured data accesses) that show that a predictive protocol increases the number of shared-data requests satisfied locally, thus reducing the remote data access latency and total execution time

    Compiler-directed Shared-Memory Communication for Iterative Parallel Applications

    No full text
    Many scientific applications are iterative and specify repetitive communication patterns. This paper shows how a parallel-language compiler and a predictive cachecoherence protocol in a distributed shared memory system together can implement shared-memory communication efficiently for applications with unpredictable but repetitive communication patterns. The compiler uses static analysis to identify program points where potentially repetitive communication occurs. At runtime, the protocol builds a communication schedule in one iteration and uses the schedule to pre-send data in subsequent iterations. This paper contains measurements of three iterative applications (including adaptive programs with unstructured data accesses) that show that a predictive protocol increases the number of shared-data requests satisfied locally, thus reducing the remote data access latency and total execution time. This work is supported in part by Wright Laboratory Avionics Directorate, Air Force Materi..

    COMPILER TECHNIQUES FOR EFFICIENT COMMUNICATIONS IN MULTIPROCESSOR SYSTEMS

    Get PDF
    Technical advances have brought circuit switching back to the stage of interconnection network design for high performance computing. Although circuit switching has long connection establishment delays and the dedication of connections prevents other communicating nodes from sharing the network, it has simple control logic and significant cost advantage over packet or wormhole switching. With the proper assistance from compilers, circuit switching has the potential of providing significant performance benefits when connections can be established prior to the actual communication. This dissertation presents a novel compilation framework for achieving efficient communications in circuit switching interconnection networks. The goal of the framework is to identify communication patterns in Single-Program-Multiple-Data (SPMD) parallel applications and compile these patterns as network configuration directives. This can significantly reduce the communication overhead on circuit switching interconnection networks. A powerful representation scheme is developed in this research to capture the property of communication patterns and allow manipulation of these patterns. Based on the temporal and spatial localities of communications and the capability of the compiler to identify the communication patterns, we classify communication patterns into three categories - static, persistent, and dynamic. We target static and persistent communications, which are dominant in most parallel applications. To identify communication patterns, we develop a novel symbolic expression analysis. We develop certain compiler techniques for analyzing communication patterns. Since the underlying network capacity is limited, we develop an algorithm to partition the program into phases based on the communication requirements and network capacity. To demonstrate the effectiveness of our framework, we implement an experimental compiler. The compiler identifies the communication patterns from the source code, partitions the program into phases, and inserts the network configuration directives at phase boundaries to achieve efficient communications. The compiler also can generate communication traces, which provides useful information about the communication pattern correlated to the structure of the source code. We develop a multiprocessor system simulator to evaluate our techniques. Our simulation-based performance analysis demonstrates that using our compiler techniques can achieve the same level, or even better level of communication performance than fast packet switching networks while using much less expensive circuit switches
    corecore