2 research outputs found

    Performance Debugging Shared Memory Parallel Programs Using Run--Time Dependence Analysis

    No full text
    We describe a new approach to performance debugging that focuses on automatically identifying computation transformations to reduce synchronization and communication. By grouping writes together into equivalence classes, we are able to tractably collect information from long--running programs. Our performance debugger analyzes this information and suggests computation transformations in terms of the source code. We present the transformations suggested by the debugger on a suite of four applications. For BarnesHut and Shallow, implementing the debugger suggestions improved the performance by a factor of 1.32 and 34 times respectively on an 8--processor IBM SP2. For Ocean, our debugger identified excess synchronization that did not have a significant impact on performance. ILINK, a genetic linkage analysis program widely used by geneticists, is already well optimized. We use it only to demonstrate the feasibility of our approach to long--running applications. We also give details on how..
    corecore