10,379 research outputs found
NBODY6++GPU: Ready for the gravitational million-body problem
Accurate direct -body simulations help to obtain detailed information
about the dynamical evolution of star clusters. They also enable comparisons
with analytical models and Fokker-Planck or Monte-Carlo methods. NBODY6 is a
well-known direct -body code for star clusters, and NBODY6++ is the extended
version designed for large particle number simulations by supercomputers. We
present NBODY6++GPU, an optimized version of NBODY6++ with hybrid
parallelization methods (MPI, GPU, OpenMP, and AVX/SSE) to accelerate large
direct -body simulations, and in particular to solve the million-body
problem. We discuss the new features of the NBODY6++GPU code, benchmarks, as
well as the first results from a simulation of a realistic globular cluster
initially containing a million particles. For million-body simulations,
NBODY6++GPU is times faster than NBODY6 with 320 CPU cores and 32
NVIDIA K20X GPUs. With this computing cluster specification, the simulations of
million-body globular clusters including primordial binaries require
about an hour per half-mass crossing time.Comment: 13 pages, 9 figures, 3 table
Improving the scalability of parallel N-body applications with an event driven constraint based execution model
The scalability and efficiency of graph applications are significantly
constrained by conventional systems and their supporting programming models.
Technology trends like multicore, manycore, and heterogeneous system
architectures are introducing further challenges and possibilities for emerging
application domains such as graph applications. This paper explores the space
of effective parallel execution of ephemeral graphs that are dynamically
generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The
workloads are expressed using the semantics of an Exascale computing execution
model called ParalleX. For comparison, results using conventional execution
model semantics are also presented. We find improved load balancing during
runtime and automatic parallelism discovery improving efficiency using the
advanced semantics for Exascale computing.Comment: 11 figure
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Afivo: a framework for quadtree/octree AMR with shared-memory parallelization and geometric multigrid methods
Afivo is a framework for simulations with adaptive mesh refinement (AMR) on
quadtree (2D) and octree (3D) grids. The framework comes with a geometric
multigrid solver, shared-memory (OpenMP) parallelism and it supports output in
Silo and VTK file formats. Afivo can be used to efficiently simulate AMR
problems with up to about unknowns on desktops, workstations or single
compute nodes. For larger problems, existing distributed-memory frameworks are
better suited. The framework has no built-in functionality for specific physics
applications, so users have to implement their own numerical methods. The
included multigrid solver can be used to efficiently solve elliptic partial
differential equations such as Poisson's equation. Afivo's design was kept
simple, which in combination with the shared-memory parallelism facilitates
modification and experimentation with AMR algorithms. The framework was already
used to perform 3D simulations of streamer discharges, which required tens of
millions of cells
A Parallel Tree code for large Nbody simulation: dynamic load balance and data distribution on CRAY T3D system
N-body algorithms for long-range unscreened interactions like gravity belong
to a class of highly irregular problems whose optimal solution is a challenging
task for present-day massively parallel computers. In this paper we describe a
strategy for optimal memory and work distribution which we have applied to our
parallel implementation of the Barnes & Hut (1986) recursive tree scheme on a
Cray T3D using the CRAFT programming environment. We have performed a series of
tests to find an " optimal data distribution " in the T3D memory, and to
identify a strategy for the " Dynamic Load Balance " in order to obtain good
performances when running large simulations (more than 10 million particles).
The results of tests show that the step duration depends on two main factors:
the data locality and the T3D network contention. Increasing data locality we
are able to minimize the step duration if the closest bodies (direct
interaction) tend to be located in the same PE local memory (contiguous block
subdivison, high granularity), whereas the tree properties have a fine grain
distribution. In a very large simulation, due to network contention, an
unbalanced load arises. To remedy this we have devised an automatic work
redistribution mechanism which provided a good Dynamic Load Balance at the
price of an insignificant overhead.Comment: 16 pages with 11 figures included, (Latex, elsart.style). Accepted by
Computer Physics Communication
A Fast Parallel Poisson Solver on Irregular Domains Applied to Beam Dynamic Simulations
We discuss the scalable parallel solution of the Poisson equation within a
Particle-In-Cell (PIC) code for the simulation of electron beams in particle
accelerators of irregular shape. The problem is discretized by Finite
Differences. Depending on the treatment of the Dirichlet boundary the resulting
system of equations is symmetric or `mildly' nonsymmetric positive definite. In
all cases, the system is solved by the preconditioned conjugate gradient
algorithm with smoothed aggregation (SA) based algebraic multigrid (AMG)
preconditioning. We investigate variants of the implementation of SA-AMG that
lead to considerable improvements in the execution times. We demonstrate good
scalability of the solver on distributed memory parallel processor with up to
2048 processors. We also compare our SAAMG-PCG solver with an FFT-based solver
that is more commonly used for applications in beam dynamics
- …