503 research outputs found

    Partitioning strategy for efficient nonlinear finite element dynamic analysis on multiprocessor computers

    Get PDF
    A computational procedure is presented for the nonlinear dynamic analysis of unsymmetric structures on vector multiprocessor systems. The procedure is based on a novel hierarchical partitioning strategy in which the response of the unsymmetric and antisymmetric response vectors (modes), each obtained by using only a fraction of the degrees of freedom of the original finite element model. The three key elements of the procedure which result in high degree of concurrency throughout the solution process are: (1) mixed (or primitive variable) formulation with independent shape functions for the different fields; (2) operator splitting or restructuring of the discrete equations at each time step to delineate the symmetric and antisymmetric vectors constituting the response; and (3) two level iterative process for generating the response of the structure. An assessment is made of the effectiveness of the procedure on the CRAY X-MP/4 computers

    Performance of a parallel code for the Euler equations on hypercube computers

    Get PDF
    The performance of hypercubes were evaluated on a computational fluid dynamics problem and the parallel environment issues were considered that must be addressed, such as algorithm changes, implementation choices, programming effort, and programming environment. The evaluation focuses on a widely used fluid dynamics code, FLO52, which solves the two dimensional steady Euler equations describing flow around the airfoil. The code development experience is described, including interacting with the operating system, utilizing the message-passing communication system, and code modifications necessary to increase parallel efficiency. Results from two hypercube parallel computers (a 16-node iPSC/2, and a 512-node NCUBE/ten) are discussed and compared. In addition, a mathematical model of the execution time was developed as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained and is used to explore the performance of the code in interesting but yet physically realizable regions of the parameter space. Based on this model, predictions about future hypercubes are made

    Parallel software tools at Langley Research Center

    Get PDF
    This document gives a brief overview of parallel software tools available on the Intel iPSC/860 parallel computer at Langley Research Center. It is intended to provide a source of information that is somewhat more concise than vendor-supplied material on the purpose and use of various tools. Each of the chapters on tools is organized in a similar manner covering an overview of the functionality, access information, how to effectively use the tool, observations about the tool and how it compares to similar software, known problems or shortfalls with the software, and reference documentation. It is primarily intended for users of the iPSC/860 at Langley Research Center and is appropriate for both the experienced and novice user

    Adaptive Parallelism for Coupled, Multithreaded Message-Passing Programs

    Get PDF
    Hybrid parallel programming models that combine message passing (MP) and shared- memory multithreading (MT) are becoming more popular, especially with applications requiring higher degrees of parallelism and scalability. Consequently, coupled parallel programs, those built via the integration of independently developed and optimized software libraries linked into a single application, increasingly comprise message-passing libraries with differing preferred degrees of threading, resulting in thread-level heterogeneity. Retroactively matching threading levels between independently developed and maintained libraries is difficult, and the challenge is exacerbated because contemporary middleware services provide only static scheduling policies over entire program executions, necessitating suboptimal, over-subscribed or under-subscribed, configurations. In coupled applications, a poorly configured component can lead to overall poor application performance, suboptimal resource utilization, and increased time-to-solution. So it is critical that each library executes in a manner consistent with its design and tuning for a particular system architecture and workload. Therefore, there is a need for techniques that address dynamic, conflicting configurations in coupled multithreaded message-passing (MT-MP) programs. Our thesis is that we can achieve significant performance improvements over static under-subscribed approaches through reconfigurable execution environments that consider compute phase parallelization strategies along with both hardware and software characteristics. In this work, we present new ways to structure, execute, and analyze coupled MT- MP programs. Our study begins with an examination of contemporary approaches used to accommodate thread-level heterogeneity in coupled MT-MP programs. Here we identify potential inefficiencies in how these programs are structured and executed in the high-performance computing domain. We then present and evaluate a novel approach for accommodating thread-level heterogeneity. Our approach enables full utilization of all available compute resources throughout an application’s execution by providing programmable facilities with modest overheads to dynamically reconfigure runtime environments for compute phases with differing threading factors and affinities. Our performance results show that for a majority of the tested scientific workloads our approach and corresponding open-source reference implementation render speedups greater than 50 % over the static under-subscribed baseline. Motivated by our examination of reconfigurable execution environments and their memory overhead, we also study the memory attribution problem: the inability to predict or evaluate during runtime where the available memory is used across the software stack comprising the application, reusable software libraries, and supporting runtime infrastructure. Specifically, dynamic adaptation requires runtime intervention, which by its nature introduces additional runtime and memory overhead. To better understand the latter, we propose and evaluate a new way to quantify component-level memory usage from unmodified binaries dynamically linked to a message-passing communication library. Our experimental results show that our approach and corresponding implementation accurately measure memory resource usage as a function of time, scale, communication workload, and software or hardware system architecture, clearly distinguishing between application and communication library usage at a per-process level

    AMRA: An Adaptive Mesh Refinement Hydrodynamic Code for Astrophysics

    Get PDF
    Implementation details and test cases of a newly developed hydrodynamic code, AMRA, are presented. The numerical scheme exploits the adaptive mesh refinement technique coupled to modern high-resolution schemes which are suitable for relativistic and non-relativistic flows. Various physical processes are incorporated using the operator splitting approach, and include self-gravity, nuclear burning, physical viscosity, implicit and explicit schemes for conductive transport, simplified photoionization, and radiative losses from an optically thin plasma. Several aspects related to the accuracy and stability of the scheme are discussed in the context of hydrodynamic and astrophysical flows.Comment: 41 pages, 21 figures (9 low-resolution), LaTeX, requires elsart.cls, submitted to Comp. Phys. Comm.; additional documentation and high-resolution figures available from http://www.camk.edu.pl/~tomek/AMRA/index.htm

    ScalAna: Automating Scaling Loss Detection with Graph Analysis

    Full text link
    Scaling a parallel program to modern supercomputers is challenging due to inter-process communication, Amdahl's law, and resource contention. Performance analysis tools for finding such scaling bottlenecks either base on profiling or tracing. Profiling incurs low overheads but does not capture detailed dependencies needed for root-cause analysis. Tracing collects all information at prohibitive overheads. In this work, we design ScalAna that uses static analysis techniques to achieve the best of both worlds - it enables the analyzability of traces at a cost similar to profiling. ScalAna first leverages static compiler techniques to build a Program Structure Graph, which records the main computation and communication patterns as well as the program's control structures. At runtime, we adopt lightweight techniques to collect performance data according to the graph structure and generate a Program Performance Graph. With this graph, we propose a novel approach, called backtracking root cause detection, which can automatically and efficiently detect the root cause of scaling loss. We evaluate ScalAna with real applications. Results show that our approach can effectively locate the root cause of scaling loss for real applications and incurs 1.73% overhead on average for up to 2,048 processes. We achieve up to 11.11% performance improvement by fixing the root causes detected by ScalAna on 2,048 processes.Comment: conferenc

    Tools for analyzing parallel I/O

    Get PDF
    Parallel application I/O performance often does not meet user expectations. Additionally, slight access pattern modifications may lead to significant changes in performance due to complex interactions between hardware and software. These issues call for sophisticated tools to capture, analyze, understand, and tune application I/O. In this paper, we highlight advances in monitoring tools to help address these issues. We also describe best practices, identify issues in measure- ment and analysis, and provide practical approaches to translate parallel I/O analysis into actionable outcomes for users, facility operators, and researchers

    PAN AIR: A computer program for predicting subsonic or supersonic linear potential flows about arbitrary configurations using a higher order panel method. Volume 2: User's manual (version 3.0)

    Get PDF
    A comprehensive description of user problem definition for the PAN AIR (Panel Aerodynamics) system is given. PAN AIR solves the 3-D linear integral equations of subsonic and supersonic flow. Influence coefficient methods are used which employ source and doublet panels as boundary surfaces. Both analysis and design boundary conditions can be used. This User's Manual describes the information needed to use the PAN AIR system. The structure and organization of PAN AIR are described, including the job control and module execution control languages for execution of the program system. The engineering input data are described, including the mathematical and physical modeling requirements. Version 3.0 strictly applies only to PAN AIR version 3.0. The major revisions include: (1) inputs and guidelines for the new FDP module (which calculates streamlines and offbody points); (2) nine new class 1 and class 2 boundary conditions to cover commonly used modeling practices, in particular the vorticity matching Kutta condition; (3) use of the CRAY solid state Storage Device (SSD); and (4) incorporation of errata and typo's together with additional explanation and guidelines

    Exploiting implicit parallelism in SPARC instruction execution

    Get PDF
    One way to increase the performance of a processing unit is to exploit implicit parallelism. Exploiting this parallelism requires a processor to dynamically select instructions in a serial instruction stream which can be executed in parallel. As operations are computed concurrently, an execution speedup will occur. This thesis studies how effectively implicit parallelism could be exploited in the Scalable Pro cessor Architecture (SPARC)[9], a reduced instruction set architecture developed by Sun Microsystems. First an analysis of SPARC instruction traces will determine the optimal speedup that would be realized by a processor with infinite resources. Next, an analytical model of a parallelizing processor will be developed and used to predict the effects of limited resources on optimal speedup. Lastly, a SPARC simulator will be employed to determine the actual speedup of resource limited configurations, and the results will be correlated with the analytical model

    A users` guide to PSTSWM

    Full text link
    corecore