5,473 research outputs found

    FIESTA 2: parallelizeable multiloop numerical calculations

    Full text link
    The program FIESTA has been completely rewritten. Now it can be used not only as a tool to evaluate Feynman integrals numerically, but also to expand Feynman integrals automatically in limits of momenta and masses with the use of sector decompositions and Mellin-Barnes representations. Other important improvements to the code are complete parallelization (even to multiple computers), high-precision arithmetics (allowing to calculate integrals which were undoable before), new integrators and Speer sectors as a strategy, the possibility to evaluate more general parametric integrals.Comment: 31 pages, 5 figure

    An efficient parallel tree-code for the simulation of self-gravitating systems

    Get PDF
    We describe a parallel version of our tree-code for the simulation of self-gravitating systems in Astrophysics. It is based on a dynamic and adaptive method for the domain decomposition, which exploits the hierarchical data arrangement used by the tree-code. It shows low computational costs for the parallelization overhead -- less than 4% of the total CPU-time in the tests done -- because the domain decomposition is performed 'on the fly' during the tree setting and the portion of the tree that is local to each processor 'enriches' itself of remote data only when they are actually needed. The performances of an implementation of the parallel code on a Cray T3E are presented and discussed. They exhibit a very good behaviour of the speedup (=15 with 16 processors and 10^5 particles) and a rather low load unbalancing (< 10% using up to 16 processors), achieving a high computation speed in the forces evaluation (>10^4 particles/sec with 8 processors).Comment: 10 pages, 8 figures, LaTeX2e, A&A class file needed (included), submitted to A&A; corrected abstract word wrappin

    Parallelized Rigid Body Dynamics

    Get PDF
    Physics engines are collections of API-like software designed for video games, movies and scientific simulations. While physics engines often come in many shapes and designs, all engines can benefit from an increase in speed via parallelization. However, despite this need for increased speed, it is uncommon to encounter a parallelized physics engine today. Many engines are long-standing projects and changing them to support parallelization is too costly to consider as a practical matter. Parallelization needs to be considered from the design stages through completion to ensure adequate implementation. In this project we develop a realistic approach to simulate physics in a parallel environment. Utilizing many techniques we establish a practical approach to significantly reduce the run-time on a standard physics engine

    GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems

    Get PDF
    While many of the architectural details of future exascale-class high performance computer systems are still a matter of intense research, there appears to be a general consensus that they will be strongly heterogeneous, featuring "standard" as well as "accelerated" resources. Today, such resources are available as multicore processors, graphics processing units (GPUs), and other accelerators such as the Intel Xeon Phi. Any software infrastructure that claims usefulness for such environments must be able to meet their inherent challenges: massive multi-level parallelism, topology, asynchronicity, and abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a collection of building blocks that targets algorithms dealing with sparse matrix representations on current and future large-scale systems. It implements the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel numerical kernels, intelligent resource management, and truly heterogeneous parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We describe the details of its design with respect to the challenges posed by modern heterogeneous supercomputers and recent algorithmic developments. Implementation details which are indispensable for achieving high efficiency are pointed out and their necessity is justified by performance measurements or predictions based on performance models. The library code and several applications are available as open source. We also provide instructions on how to make use of GHOST in existing software packages, together with a case study which demonstrates the applicability and performance of GHOST as a component within a larger software stack.Comment: 32 pages, 11 figure

    Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

    Full text link
    GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

    A Parallel Mesh-Adaptive Framework for Hyperbolic Conservation Laws

    Full text link
    We report on the development of a computational framework for the parallel, mesh-adaptive solution of systems of hyperbolic conservation laws like the time-dependent Euler equations in compressible gas dynamics or Magneto-Hydrodynamics (MHD) and similar models in plasma physics. Local mesh refinement is realized by the recursive bisection of grid blocks along each spatial dimension, implemented numerical schemes include standard finite-differences as well as shock-capturing central schemes, both in connection with Runge-Kutta type integrators. Parallel execution is achieved through a configurable hybrid of POSIX-multi-threading and MPI-distribution with dynamic load balancing. One- two- and three-dimensional test computations for the Euler equations have been carried out and show good parallel scaling behavior. The Racoon framework is currently used to study the formation of singularities in plasmas and fluids.Comment: late submissio

    Parallelization of a Code for the Simulation of Self-gravitating Systems in Astrophysics. Preliminary Speed-up Results

    Get PDF
    We have preliminary results on the parallelization of a Tree-Code for evaluating gravitational forces in N-body astrophysical systems. For our Cray T3D/CRAFT implementation, we have obtained an encouraging speed-up behavior, which reaches a value of 37 with 64 processor elements (PEs). According to the Amdahl'law, this means that about 99% of the code is actually parallelized. The speed-up tests regarded the evaluation of the forces among N = 130,369 particles distributed scaling the actual distribution of a sample of galaxies seen in the Northern sky hemisphere. Parallelization of the time integration of the trajectories, which has not yet been taken into account, is both easier to implement and not as fundamental.Comment: 14 pages LaTeX + 1 EPS figure + 2 EPS colour figures, epsf.sty and aasms4.sty included; to be published in Science & Supercomputing at CINECA, Report 1997 (Bologna, Italy
    • …
    corecore