3 research outputs found

    Compiler Techniques for Optimizing Communication and Data Distribution for Distributed-Memory Computers

    Get PDF
    Advanced Research Projects Agency (ARPA)National Aeronautics and Space AdministrationOpe

    Interprocedural Array Data-Flow Analysis for Cache Coherence

    No full text
    The presence of procedures and procedure calls introduces side effects, which complicate the analysis of stale reference detection in compiler-directed cache coherence schemes [4, 3, 9]. Previous compiler algorithms use the invalidation of an entire cache at procedure boundary [5, 8] or inlining [8] to avoid reference marking interprocedurally. However, frequent cache invalidations will result in poor performance since locality can not be exploited across the procedure boundary. Also, the inlining is often prohibitive due to both its code expansion and increase in its compilation time and memory requirements. In this paper, we introduce an improved intraprocedural and interprocedural algorithms for detecting references to stale data. The intraprocedural algorithm can mark potential stale references without relying on any cache invalidation or inlining at procedure boundaries, thus avoiding unnecessary cache misses for subroutine local data. The interprocedural algorithm performs bottom..

    Hybrid analysis of memory references and its application to automatic parallelization

    Get PDF
    Executing sequential code in parallel on a multithreaded machine has been an elusive goal of the academic and industrial research communities for many years. It has recently become more important due to the widespread introduction of multicores in PCs. Automatic multithreading has not been achieved because classic, static compiler analysis was not powerful enough and program behavior was found to be, in many cases, input dependent. Speculative thread level parallelization was a welcome avenue for advancing parallelization coverage but its performance was not always optimal due to the sometimes unnecessary overhead of checking every dynamic memory reference. In this dissertation we introduce a novel analysis technique, Hybrid Analysis, which unifies static and dynamic memory reference techniques into a seamless compiler framework which extracts almost maximum available parallelism from scientific codes and incurs close to the minimum necessary run time overhead. We present how to extract maximum information from the quantities that could not be sufficiently analyzed through static compiler methods, and how to generate sufficient conditions which, when evaluated dynamically, can validate optimizations. Our techniques have been fully implemented in the Polaris compiler and resulted in whole program speedups on a large number of industry standard benchmark applications
    corecore