Search CORE

5,473 research outputs found

FIESTA 2: parallelizeable multiloop numerical calculations

Author: A.V. Smirnov
Anastasiou
Anastasiou
Anastasiou
Anastasiou
Anastasiou
Baikov
Baikov
Baikov
Bekavac
Bekavac
Bell
Beneke
Bergère
Bergère
Bergère
Binoth
Binoth
Binoth
Bogner
Bogner
Bollini
Bonciani
Bonciani
Boughezal
Breitenlohner
Czakon
Czakon
Del Duca
Del Duca
Denner
Denner
Denner
Denner
Denner
Dowling
Ferroglia
Ferroglia
Gehrmann
Gluza
Hahn
Heinrich
Heinrich
Heinrich
Heinrich
Heinrich
Hepp
Horoi
Kaneko
Kiyo
Lee
M. Tentyukov
Marquard
Pilipp
Pohlmeyer
Pozzorini
Roth
Seidel
Smirnov
Smirnov
Smirnov
Smirnov
Smirnov
Smirnov
Smirnov
Smirnov
Smirnov
Smirnov
Smirnov
Smirnov
Smirnov
Smirnov
Smirnov
Smirnov
Smirnov
Speer
Speer
Steinhauser
Tausk
Ueda
V.A. Smirnov
Velizhanin
Zavialov
’t Hooft
Publication venue: 'Elsevier BV'
Publication date: 01/12/2009
Field of study

The program FIESTA has been completely rewritten. Now it can be used not only as a tool to evaluate Feynman integrals numerically, but also to expand Feynman integrals automatically in limits of momenta and masses with the use of sector decompositions and Mellin-Barnes representations. Other important improvements to the code are complete parallelization (even to multiple computers), high-precision arithmetics (allowing to calculate integrals which were undoable before), new integrators and Speer sectors as a strategy, the possibility to evaluate more general parametric integrals.Comment: 31 pages, 5 figure

arXiv.org e-Print Archive

Crossref

An efficient parallel tree-code for the simulation of self-gravitating systems

Author: Capuzzo-Dolcetta R.
Miocchi P.
Publication venue: 'EDP Sciences'
Publication date: 09/04/2001
Field of study

We describe a parallel version of our tree-code for the simulation of self-gravitating systems in Astrophysics. It is based on a dynamic and adaptive method for the domain decomposition, which exploits the hierarchical data arrangement used by the tree-code. It shows low computational costs for the parallelization overhead -- less than 4% of the total CPU-time in the tests done -- because the domain decomposition is performed 'on the fly' during the tree setting and the portion of the tree that is local to each processor 'enriches' itself of remote data only when they are actually needed. The performances of an implementation of the parallel code on a Cray T3E are presented and discussed. They exhibit a very good behaviour of the speedup (=15 with 16 processors and 10^5 particles) and a rather low load unbalancing (< 10% using up to 16 processors), achieving a high computation speed in the forces evaluation (>10^4 particles/sec with 8 processors).Comment: 10 pages, 8 figures, LaTeX2e, A&A class file needed (included), submitted to A&A; corrected abstract word wrappin

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

CERN Document Server

Archivio della ricerca- Università di Roma La Sapienza

Parallelized Rigid Body Dynamics

Author: Linford John Calvin
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2014
Field of study

Physics engines are collections of API-like software designed for video games, movies and scientific simulations. While physics engines often come in many shapes and designs, all engines can benefit from an increase in speed via parallelization. However, despite this need for increased speed, it is uncommon to encounter a parallelized physics engine today. Many engines are long-standing projects and changing them to support parallelization is too costly to consider as a practical matter. Parallelization needs to be considered from the design stages through completion to ensure adequate implementation. In this project we develop a realistic approach to simulate physics in a parallel environment. Utilizing many techniques we establish a practical approach to significantly reduce the run-time on a standard physics engine

SJSU ScholarWorks

GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems

Author: Basermann Achim
Fehske Holger
Galgon Martin
Hager Georg
Kreutzer Moritz
Pieper Andreas
Röhrig-Zöllner Melven
Shahzad Faisal
Thies Jonas
Wellein Gerhard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

While many of the architectural details of future exascale-class high performance computer systems are still a matter of intense research, there appears to be a general consensus that they will be strongly heterogeneous, featuring "standard" as well as "accelerated" resources. Today, such resources are available as multicore processors, graphics processing units (GPUs), and other accelerators such as the Intel Xeon Phi. Any software infrastructure that claims usefulness for such environments must be able to meet their inherent challenges: massive multi-level parallelism, topology, asynchronicity, and abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a collection of building blocks that targets algorithms dealing with sparse matrix representations on current and future large-scale systems. It implements the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel numerical kernels, intelligent resource management, and truly heterogeneous parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We describe the details of its design with respect to the challenges posed by modern heterogeneous supercomputers and recent algorithmic developments. Implementation details which are indispensable for achieving high efficiency are pointed out and their necessity is justified by performance measurements or predictions based on performance models. The library code and several applications are available as open source. We also provide instructions on how to make use of GHOST in existing software packages, together with a case study which demonstrates the applicability and performance of GHOST as a component within a larger software stack.Comment: 32 pages, 11 figure

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

Author: A Arnold
A Faradjian
B Hess
C Schütte
G Wilson
JA Anderson
JC Phillips
KJ Bowers
KJ Bowers
L Verlet
M Eleftheriou
M Shirts
MJ Abraham
P Eastman
R Yokota
S Pronk
S Páll
U Essmann
W Humphrey
WM Brown
Y Andoh
Y Sugita
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

arXiv.org e-Print Archive

Publikationer från KTH

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

MPG.PuRe

A Parallel Mesh-Adaptive Framework for Hyperbolic Conservation Laws

Author: Berger
Friedel
Fryxell
Godunov
Grauer
Grauer
Groth
Hilbert
Jürgen Dreher
Keppens
Kurganov
Lax
MacNeice
Nessyahu
Powell
Rainer Grauer
Roe
Steiner
Toro
Tóth
Woodward
Ziegler
Zumbusch
Zumbusch
Publication venue: 'Elsevier BV'
Publication date: 01/02/2006
Field of study

We report on the development of a computational framework for the parallel, mesh-adaptive solution of systems of hyperbolic conservation laws like the time-dependent Euler equations in compressible gas dynamics or Magneto-Hydrodynamics (MHD) and similar models in plasma physics. Local mesh refinement is realized by the recursive bisection of grid blocks along each spatial dimension, implemented numerical schemes include standard finite-differences as well as shock-capturing central schemes, both in connection with Runge-Kutta type integrators. Parallel execution is achieved through a configurable hybrid of POSIX-multi-threading and MPI-distribution with dynamic load balancing. One- two- and three-dimensional test computations for the Euler equations have been carried out and show good parallel scaling behavior. The Racoon framework is currently used to study the formation of singularities in plasmas and fluids.Comment: late submissio

arXiv.org e-Print Archive

Crossref

CERN Document Server

The Hipeac Vision, 2010

Author: Cohen Albert
De Bosschere Koen
De Sutter Bjorn
Duranton Marc
Falsafi Babak
Gaydadjiev Georgi
Katevenis Manolis
Maebe Jonas
Munk Harm
Navarro Nacho
Ramirez Alex
Temam Olivier
Valero Matero
Yehia Sami
Publication venue: HiPEAC
Publication date: 01/01/2010
Field of study

Ghent University Academic Bibliography

Archivsystem Ask23

Parallelization of a Code for the Simulation of Self-gravitating Systems in Astrophysics. Preliminary Speed-up Results

Author: Capuzzo-Dolcetta R.
Miocchi P.
Publication venue
Publication date: 01/01/1998
Field of study

We have preliminary results on the parallelization of a Tree-Code for evaluating gravitational forces in N-body astrophysical systems. For our Cray T3D/CRAFT implementation, we have obtained an encouraging speed-up behavior, which reaches a value of 37 with 64 processor elements (PEs). According to the Amdahl'law, this means that about 99% of the code is actually parallelized. The speed-up tests regarded the evaluation of the forces among N = 130,369 particles distributed scaling the actual distribution of a sample of galaxies seen in the Northern sky hemisphere. Parallelization of the time integration of the trajectories, which has not yet been taken into account, is both easier to implement and not as fundamental.Comment: 14 pages LaTeX + 1 EPS figure + 2 EPS colour figures, epsf.sty and aasms4.sty included; to be published in Science & Supercomputing at CINECA, Report 1997 (Bologna, Italy

arXiv.org e-Print Archive

CERN Document Server

Archivio della ricerca- Università di Roma La Sapienza