44,874 research outputs found
High Performance P3M N-body code: CUBEP3M
This paper presents CUBEP3M, a publicly-available high performance
cosmological N-body code and describes many utilities and extensions that have
been added to the standard package. These include a memory-light runtime SO
halo finder, a non-Gaussian initial conditions generator, and a system of
unique particle identification. CUBEP3M is fast, its accuracy is tuneable to
optimize speed or memory, and has been run on more than 27,000 cores, achieving
within a factor of two of ideal weak scaling even at this problem size. The
code can be run in an extra-lean mode where the peak memory imprint for large
runs is as low as 37 bytes per particles, which is almost two times leaner than
other widely used N-body codes. However, load imbalances can increase this
requirement by a factor of two, such that fast configurations with all the
utilities enabled and load imbalances factored in require between 70 and 120
bytes per particles. CUBEP3M is well designed to study large scales
cosmological systems, where imbalances are not too large and adaptive
time-stepping not essential. It has already been used for a broad number of
science applications that require either large samples of non-linear
realizations or very large dark matter N-body simulations, including
cosmological reionization, halo formation, baryonic acoustic oscillations, weak
lensing or non-Gaussian statistics. We discuss the structure, the accuracy,
known systematic effects and the scaling performance of the code and its
utilities, when applicable.Comment: 20 pages, 17 figures, added halo profiles, updated to match MNRAS
accepted versio
2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation
We report on improvements made over the past two decades to our adaptive
treecode N-body method (HOT). A mathematical and computational approach to the
cosmological N-body problem is described, with performance and scalability
measured up to 256k () processors. We present error analysis and
scientific application results from a series of more than ten 69 billion
() particle cosmological simulations, accounting for
floating point operations. These results include the first simulations using
the new constraints on the standard model of cosmology from the Planck
satellite. Our simulations set a new standard for accuracy and scientific
throughput, while meeting or exceeding the computational efficiency of the
latest generation of hybrid TreePM N-body methods.Comment: 12 pages, 8 figures, 77 references; To appear in Proceedings of SC
'1
Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond
In this and a set of companion whitepapers, the USQCD Collaboration lays out
a program of science and computing for lattice gauge theory. These whitepapers
describe how calculation using lattice QCD (and other gauge theories) can aid
the interpretation of ongoing and upcoming experiments in particle and nuclear
physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers
Computational Physics on Graphics Processing Units
The use of graphics processing units for scientific computations is an
emerging strategy that can significantly speed up various different algorithms.
In this review, we discuss advances made in the field of computational physics,
focusing on classical molecular dynamics, and on quantum simulations for
electronic structure calculations using the density functional theory, wave
function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012,
Helsinki, Finland, June 10-13, 201
HPC compact quasi-Newton algorithm for interface problems
In this work we present a robust interface coupling algorithm called Compact
Interface quasi-Newton (CIQN). It is designed for computationally intensive
applications using an MPI multi-code partitioned scheme. The algorithm allows
to reuse information from previous time steps, feature that has been previously
proposed to accelerate convergence. Through algebraic manipulation, an
efficient usage of the computational resources is achieved by: avoiding
construction of dense matrices and reduce every multiplication to a
matrix-vector product and reusing the computationally expensive loops. This
leads to a compact version of the original quasi-Newton algorithm. Altogether
with an efficient communication, in this paper we show an efficient scalability
up to 4800 cores. Three examples with qualitatively different dynamics are
shown to prove that the algorithm can efficiently deal with added mass
instability and two-field coupled problems. We also show how reusing histories
and filtering does not necessarily makes a more robust scheme and, finally, we
prove the necessity of this HPC version of the algorithm. The novelty of this
article lies in the HPC focused implementation of the algorithm, detailing how
to fuse and combine the composing blocks to obtain an scalable MPI
implementation. Such an implementation is mandatory in large scale cases, for
which the contact surface cannot be stored in a single computational node, or
the number of contact nodes is not negligible compared with the size of the
domain. \c{opyright} Elsevier. This manuscript version is made available
under the CC-BY-NC-ND 4.0 license
http://creativecommons.org/licenses/by-nc-nd/4.0/Comment: 33 pages: 23 manuscript, 10 appendix. 16 figures: 4 manuscript, 12
appendix. 10 Tables: 3 manuscript, 7 appendi
- …