Search CORE

3 research outputs found

Compiler Techniques for Optimizing Communication and Data Distribution for Distributed-Memory Computers

Author: Palermo Daniel Joseph
Publication venue: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/05/1996
Field of study

Advanced Research Projects Agency (ARPA)National Aeronautics and Space AdministrationOpe

Illinois Digital Environment for Access to Learning and Scholarship Repository

Interprocedural Array Data-Flow Analysis for Cache Coherence

Author: Lynn Choi
Pen-Chung Yew
Publication venue
Publication date
Field of study

The presence of procedures and procedure calls introduces side effects, which complicate the analysis of stale reference detection in compiler-directed cache coherence schemes [4, 3, 9]. Previous compiler algorithms use the invalidation of an entire cache at procedure boundary [5, 8] or inlining [8] to avoid reference marking interprocedurally. However, frequent cache invalidations will result in poor performance since locality can not be exploited across the procedure boundary. Also, the inlining is often prohibitive due to both its code expansion and increase in its compilation time and memory requirements. In this paper, we introduce an improved intraprocedural and interprocedural algorithms for detecting references to stale data. The intraprocedural algorithm can mark potential stale references without relying on any cache invalidation or inlining at procedure boundaries, thus avoiding unnecessary cache misses for subroutine local data. The interprocedural algorithm performs bottom..

CiteSeerX

Hybrid analysis of memory references and its application to automatic parallelization

Author: Rus Silvius Vasile
Publication venue
Publication date: 15/05/2009
Field of study

Executing sequential code in parallel on a multithreaded machine has been an elusive goal of the academic and industrial research communities for many years. It has recently become more important due to the widespread introduction of multicores in PCs. Automatic multithreading has not been achieved because classic, static compiler analysis was not powerful enough and program behavior was found to be, in many cases, input dependent. Speculative thread level parallelization was a welcome avenue for advancing parallelization coverage but its performance was not always optimal due to the sometimes unnecessary overhead of checking every dynamic memory reference. In this dissertation we introduce a novel analysis technique, Hybrid Analysis, which unifies static and dynamic memory reference techniques into a seamless compiler framework which extracts almost maximum available parallelism from scientific codes and incurs close to the minimum necessary run time overhead. We present how to extract maximum information from the quantities that could not be sufficiently analyzed through static compiler methods, and how to generate sufficient conditions which, when evaluated dynamically, can validate optimizations. Our techniques have been fully implemented in the Polaris compiler and resulted in whole program speedups on a large number of industry standard benchmark applications

Texas A&M Repository