118 research outputs found

    Visualization of program performance on concurrent computers

    Get PDF
    A distributed memory concurrent computer (such as a hypercube computer) is inherently a complex system involving the collective and simultaneous interaction of many entities engaged in computation and communication activities. Program performance evaluation in concurrent computer systems requires methods and tools for observing, analyzing, and displaying system performance. This dissertation describes a methodology for collecting and displaying, via a unique graphical approach, performance measurement information from (possibly large) concurrent computer systems. Performance data are generated and collected via instrumentation. The data are then reduced via conventional cluster analysis techniques and converted into a pictorial form to highlight important aspects of program states during execution. Local and summary statistics are calculated. Included in the suite of defined metrics are measures for quantifying and comparing amounts of computation and communication. A novel kind of data plot is introduced to visually display both temporal and spatial information describing system activity. Phenomena such as hot spots of activity are easily observed, and in some cases, patterns inherent in the application algorithms being studied are highly visible. The approach also provides a framework for a visual solution to the problem of mapping a given parallel algorithm to an underlying parallel machine. A prototype implementation applied to several case studies is presented to demonstrate the feasibility and power of the approach

    Task mapping for non-contiguous allocations.

    Get PDF
    This paper examines task mapping algorithms for non-contiguously allocated parallel jobs. Several studies have shown that task placement affects job running time for both contiguously and non-contiguously allocated jobs. Traditionally, work on task mapping either uses a very general model where the job has an arbitrary communication pattern or assumes that jobs are allocated contiguously, making them completely isolated from each other. A middle ground between these two cases is the mapping problem for non-contiguous jobs having a specific communication pattern. We propose several task mapping algorithms for jobs with a stencil communication pattern and evaluate them using experiments and simulations. Our strategies improve the running time of a MiniApp by as much as 30% over a baseline strategy. Furthermore, this improvement increases markedly with the job size, demonstrating the importance of task mapping as systems grow toward exascale

    Achieving parallel performance in scientific computations

    Get PDF
    corecore