5,473 research outputs found
FIESTA 2: parallelizeable multiloop numerical calculations
The program FIESTA has been completely rewritten. Now it can be used not only
as a tool to evaluate Feynman integrals numerically, but also to expand Feynman
integrals automatically in limits of momenta and masses with the use of sector
decompositions and Mellin-Barnes representations. Other important improvements
to the code are complete parallelization (even to multiple computers),
high-precision arithmetics (allowing to calculate integrals which were undoable
before), new integrators and Speer sectors as a strategy, the possibility to
evaluate more general parametric integrals.Comment: 31 pages, 5 figure
An efficient parallel tree-code for the simulation of self-gravitating systems
We describe a parallel version of our tree-code for the simulation of
self-gravitating systems in Astrophysics. It is based on a dynamic and adaptive
method for the domain decomposition, which exploits the hierarchical data
arrangement used by the tree-code. It shows low computational costs for the
parallelization overhead -- less than 4% of the total CPU-time in the tests
done -- because the domain decomposition is performed 'on the fly' during the
tree setting and the portion of the tree that is local to each processor
'enriches' itself of remote data only when they are actually needed.
The performances of an implementation of the parallel code on a Cray T3E are
presented and discussed. They exhibit a very good behaviour of the speedup (=15
with 16 processors and 10^5 particles) and a rather low load unbalancing (< 10%
using up to 16 processors), achieving a high computation speed in the forces
evaluation (>10^4 particles/sec with 8 processors).Comment: 10 pages, 8 figures, LaTeX2e, A&A class file needed (included),
submitted to A&A; corrected abstract word wrappin
Parallelized Rigid Body Dynamics
Physics engines are collections of API-like software designed for video games, movies and scientific simulations. While physics engines often come in many shapes and designs, all engines can benefit from an increase in speed via parallelization. However, despite this need for increased speed, it is uncommon to encounter a parallelized physics engine today. Many engines are long-standing projects and changing them to support parallelization is too costly to consider as a practical matter. Parallelization needs to be considered from the design stages through completion to ensure adequate implementation. In this project we develop a realistic approach to simulate physics in a parallel environment. Utilizing many techniques we establish a practical approach to significantly reduce the run-time on a standard physics engine
GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems
While many of the architectural details of future exascale-class high
performance computer systems are still a matter of intense research, there
appears to be a general consensus that they will be strongly heterogeneous,
featuring "standard" as well as "accelerated" resources. Today, such resources
are available as multicore processors, graphics processing units (GPUs), and
other accelerators such as the Intel Xeon Phi. Any software infrastructure that
claims usefulness for such environments must be able to meet their inherent
challenges: massive multi-level parallelism, topology, asynchronicity, and
abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a
collection of building blocks that targets algorithms dealing with sparse
matrix representations on current and future large-scale systems. It implements
the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel
numerical kernels, intelligent resource management, and truly heterogeneous
parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We
describe the details of its design with respect to the challenges posed by
modern heterogeneous supercomputers and recent algorithmic developments.
Implementation details which are indispensable for achieving high efficiency
are pointed out and their necessity is justified by performance measurements or
predictions based on performance models. The library code and several
applications are available as open source. We also provide instructions on how
to make use of GHOST in existing software packages, together with a case study
which demonstrates the applicability and performance of GHOST as a component
within a larger software stack.Comment: 32 pages, 11 figure
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
GROMACS is a widely used package for biomolecular simulation, and over the
last two decades it has evolved from small-scale efficiency to advanced
heterogeneous acceleration and multi-level parallelism targeting some of the
largest supercomputers in the world. Here, we describe some of the ways we have
been able to realize this through the use of parallelization on all levels,
combined with a constant focus on absolute performance. Release 4.6 of GROMACS
uses SIMD acceleration on a wide range of architectures, GPU offloading
acceleration, and both OpenMP and MPI parallelism within and between nodes,
respectively. The recent work on acceleration made it necessary to revisit the
fundamental algorithms of molecular simulation, including the concept of
neighborsearching, and we discuss the present and future challenges we see for
exascale simulation - in particular a very fine-grained task parallelism. We
also discuss the software management, code peer review and continuous
integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin
A Parallel Mesh-Adaptive Framework for Hyperbolic Conservation Laws
We report on the development of a computational framework for the parallel,
mesh-adaptive solution of systems of hyperbolic conservation laws like the
time-dependent Euler equations in compressible gas dynamics or
Magneto-Hydrodynamics (MHD) and similar models in plasma physics. Local mesh
refinement is realized by the recursive bisection of grid blocks along each
spatial dimension, implemented numerical schemes include standard
finite-differences as well as shock-capturing central schemes, both in
connection with Runge-Kutta type integrators. Parallel execution is achieved
through a configurable hybrid of POSIX-multi-threading and MPI-distribution
with dynamic load balancing. One- two- and three-dimensional test computations
for the Euler equations have been carried out and show good parallel scaling
behavior. The Racoon framework is currently used to study the formation of
singularities in plasmas and fluids.Comment: late submissio
Parallelization of a Code for the Simulation of Self-gravitating Systems in Astrophysics. Preliminary Speed-up Results
We have preliminary results on the parallelization of a Tree-Code for
evaluating gravitational forces in N-body astrophysical systems. For our Cray
T3D/CRAFT implementation, we have obtained an encouraging speed-up behavior,
which reaches a value of 37 with 64 processor elements (PEs). According to the
Amdahl'law, this means that about 99% of the code is actually parallelized. The
speed-up tests regarded the evaluation of the forces among N = 130,369
particles distributed scaling the actual distribution of a sample of galaxies
seen in the Northern sky hemisphere. Parallelization of the time integration of
the trajectories, which has not yet been taken into account, is both easier to
implement and not as fundamental.Comment: 14 pages LaTeX + 1 EPS figure + 2 EPS colour figures, epsf.sty and
aasms4.sty included; to be published in Science & Supercomputing at CINECA,
Report 1997 (Bologna, Italy
- …