74 research outputs found

    Diagonal - implicity iterated Runge-Kutta methods on distributed memory multiprocessors

    Get PDF
    We investigate the parallel implementation of the diagonal-implicitly iterated Ruge-Kutta (DIIRK) method, an iteration method based on a predictor-corrector scheme. This method is appropriate for the solution of stiff systems of ordinary differential equations (ODEs) and provides embedded formulae to control the stepsize. We discuss different strategies for the implementation of the DIIRK method on distributed memory multiprocessors which mainly differ in the order of independent computations and the data distribution. In particular, we consider a consecutive implementation that executes the steps of each corrector iteration in sequential order and distributes the resulting equation systems among all available processors, and a group implementation that executes the steps in parallel by independent groups of processors. The performance of these implementations depends on the right hand side of the ODE system: For sparse functions, the group implementations is superior and achieves medium range seedup values. For dense functions, the consecutive implementation is better and achieves good speedup values.

    Parallel iterated Runge-Kutta methods and applications

    Get PDF
    The iterated Runge-Kutta (IRK) method is an iteration scheme for the numerical solution of initial value problems (IVP) of ordinary differential equations (ODEs) that is based on a predictor-corrector method with an Runge-Kutta (RK) method as corrector. Embedded approxination formulae are used to control stepsize. We present different parallel algorithms of the IRK method on distributed memory multiprocessors for the solution of systems of ODEs. The parallel algorithms are given in an SPMD (single-program multipledata) programming style where data exchanges are described with appropriate communication primitives. A theoretical performance analysis and a runtime simulation allow to value the presented algorithms. The implementation on the Intel iPSC/860 confirms the predicted runtimes. The speedup values strongly depend on the particular system of ODEs to be solved. The parallel IRK method is applied to a typical discretization problem, the discretized Brusselator equation. Application specific modifications of the general parallel ODE solver are developped which result in a considerable reduction of the parallel execution time.

    On-line reconstruction algorithms for the CBM and ALICE experiments

    Get PDF
    Diese Dissertation präsentiert verschiedenen Algorithmen, die für die Echtzeit-Ereignisrekonstruktion im CBM-Experiment der GSI (in Darmstadt) und im ALICE-Experiment am CERN (in Genf) entwickelt wurden. Obwohl diese Experimente unterschiedlich sind - CBM ist ein Fixed-Target Experiment mit Forward-Geometrie, während ALICE eine typische Collider-Geometrie hat - gibt es bei der Rekonstruktion gemeinsame Aspekte. Diese Arbeit beschreibt: — allgemeine Änderungen an der Kalman-Filter-Methode, die bestehende Fit-Algorithmen (auch Anpassungsalgorithmen genannt) beschleunigen, vereinfachen sowie deren numerische Stabilität verbessern. — Fit-Algorithmen, die für die CBM und ALICE Experimente entwickelt wurden, inklusive einer neuen Methode für die Spurextrapolation in nicht-homogenen Magnetfeldern. — die entwickelten Algorithmen für die Bestimmung der primären und sekundären Vertices in beiden Experimenten. Insbesondere wird eine Methode zur Rekonstruktion der zerfallenen Teilchen vorgestellt. — parallelisierte Methoden für die Echtzeit-Spursuche im CBM Experiment. — parallelisierte Methoden zur Echtzeit-Spursuche im High Level Trigger des ALICE-Experiments. — die Realisierung der Spurrekonsturtion auf moderner Hardware, insbesondere Vektorprozessoren und GPUs. Alle vorgestellten Methoden sind vom oder mit direkter Beteiligung des Autors entwickelt worden.This thesis presents various algorithms which have been developed for on-line event reconstruction in the CBM experiment at GSI, Darmstadt and the ALICE experiment at CERN, Geneve. Despite the fact that the experiments are different — CBM is a fixed target experiment with forward geometry, while ALICE has a typical collider geometry — they share common aspects when reconstruction is concerned. The thesis describes: — general modifications to the Kalman filter method, which allows one to accelerate, to improve, and to simplify existing fit algorithms; — developed algorithms for track fit in CBM and ALICE experiment, including a new method for track extrapolation in non-homogeneous magnetic field. — developed algorithms for primary and secondary vertex fit in the both experiments. In particular, a new method of reconstruction of decayed particles is presented. — developed parallel algorithm for the on-line tracking in the CBM experiment. — developed parallel algorithm for the on-line tracking in High Level Trigger of the ALICE experiment. — the realisation of the track finders on modern hardware, such as SIMD CPU registers and GPU accelerators. All the presented methods have been developed by or with the direct participation of the author

    Acceleration of a Full-scale Industrial CFD Application with OP2

    Get PDF

    Methods for Multilevel Parallelism on GPU Clusters: Application to a Multigrid Accelerated Navier-Stokes Solver

    Get PDF
    Computational Fluid Dynamics (CFD) is an important field in high performance computing with numerous applications. Solving problems in thermal and fluid sciences demands enormous computing resources and has been one of the primary applications used on supercomputers and large clusters. Modern graphics processing units (GPUs) with many-core architectures have emerged as general-purpose parallel computing platforms that can accelerate simulation science applications substantially. While significant speedups have been obtained with single and multiple GPUs on a single workstation, large problems require more resources. Conventional clusters of central processing units (CPUs) are now being augmented with GPUs in each compute-node to tackle large problems. The present research investigates methods of taking advantage of the multilevel parallelism in multi-node, multi-GPU systems to develop scalable simulation science software. The primary application the research develops is a cluster-ready GPU-accelerated Navier-Stokes incompressible flow solver that includes advanced numerical methods, including a geometric multigrid pressure Poisson solver. The research investigates multiple implementations to explore computation / communication overlapping methods. The research explores methods for coarse-grain parallelism, including POSIX threads, MPI, and a hybrid OpenMP-MPI model. The application includes a number of usability features, including periodic VTK (Visualization Toolkit) output, a run-time configuration file, and flexible setup of obstacles to represent urban areas and complex terrain. Numerical features include a variety of time-stepping methods, buoyancy-drivenflow, adaptive time-stepping, various iterative pressure solvers, and a new parallel 3D geometric multigrid solver. At each step, the project examines performance and scalability measures using the Lincoln Tesla cluster at the National Center for Supercomputing Applications (NCSA) and the Longhorn cluster at the Texas Advanced Computing Center (TACC). The results demonstrate that multi-GPU clusters can substantially accelerate computational fluid dynamics simulations
    • …
    corecore