21,039 research outputs found
2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation
We report on improvements made over the past two decades to our adaptive
treecode N-body method (HOT). A mathematical and computational approach to the
cosmological N-body problem is described, with performance and scalability
measured up to 256k () processors. We present error analysis and
scientific application results from a series of more than ten 69 billion
() particle cosmological simulations, accounting for
floating point operations. These results include the first simulations using
the new constraints on the standard model of cosmology from the Planck
satellite. Our simulations set a new standard for accuracy and scientific
throughput, while meeting or exceeding the computational efficiency of the
latest generation of hybrid TreePM N-body methods.Comment: 12 pages, 8 figures, 77 references; To appear in Proceedings of SC
'1
4.45 Pflops Astrophysical N-Body Simulation on K computer -- The Gravitational Trillion-Body Problem
As an entry for the 2012 Gordon-Bell performance prize, we report performance
results of astrophysical N-body simulations of one trillion particles performed
on the full system of K computer. This is the first gravitational trillion-body
simulation in the world. We describe the scientific motivation, the numerical
algorithm, the parallelization strategy, and the performance analysis. Unlike
many previous Gordon-Bell prize winners that used the tree algorithm for
astrophysical N-body simulations, we used the hybrid TreePM method, for similar
level of accuracy in which the short-range force is calculated by the tree
algorithm, and the long-range force is solved by the particle-mesh algorithm.
We developed a highly-tuned gravity kernel for short-range forces, and a novel
communication algorithm for long-range forces. The average performance on 24576
and 82944 nodes of K computer are 1.53 and 4.45 Pflops, which correspond to 49%
and 42% of the peak speed.Comment: 10 pages, 6 figures, Proceedings of Supercomputing 2012
(http://sc12.supercomputing.org/), Gordon Bell Prize Winner. Additional
information is http://www.ccs.tsukuba.ac.jp/CCS/eng/gbp201
Distributed-memory parallelization of an explicit time-domain volume integral equation solver on Blue Gene/P
Two distributed-memory schemes for efficiently parallelizing the explicit marching-on in-time based solution of the time domain volume integral equation on the IBM Blue Gene/P platform are presented. In the first scheme, each processor stores the time history of all source fields and only the computationally dominant step of the tested field computations is distributed among processors. This scheme requires all-to-all global communications to update the time history of the source fields from the tested fields. In the second scheme, the source fields as well as all steps of the tested field computations are distributed among processors. This scheme requires sequential global communications to update the time history of the distributed source fields from the tested fields. Numerical results demonstrate that both schemes scale well on the IBM Blue Gene/P platform and the memory efficient second scheme allows for the characterization of transient wave interactions on composite structures discretized using three million spatial elements without an acceleration algorithm
Enhancing speed and scalability of the ParFlow simulation code
Regional hydrology studies are often supported by high resolution simulations
of subsurface flow that require expensive and extensive computations. Efficient
usage of the latest high performance parallel computing systems becomes a
necessity. The simulation software ParFlow has been demonstrated to meet this
requirement and shown to have excellent solver scalability for up to 16,384
processes. In the present work we show that the code requires further
enhancements in order to fully take advantage of current petascale machines. We
identify ParFlow's way of parallelization of the computational mesh as a
central bottleneck. We propose to reorganize this subsystem using fast mesh
partition algorithms provided by the parallel adaptive mesh refinement library
p4est. We realize this in a minimally invasive manner by modifying selected
parts of the code to reinterpret the existing mesh data structures. We evaluate
the scaling performance of the modified version of ParFlow, demonstrating good
weak and strong scaling up to 458k cores of the Juqueen supercomputer, and test
an example application at large scale.Comment: The final publication is available at link.springer.co
Fluid Communities: A Competitive, Scalable and Diverse Community Detection Algorithm
We introduce a community detection algorithm (Fluid Communities) based on the
idea of fluids interacting in an environment, expanding and contracting as a
result of that interaction. Fluid Communities is based on the propagation
methodology, which represents the state-of-the-art in terms of computational
cost and scalability. While being highly efficient, Fluid Communities is able
to find communities in synthetic graphs with an accuracy close to the current
best alternatives. Additionally, Fluid Communities is the first
propagation-based algorithm capable of identifying a variable number of
communities in network. To illustrate the relevance of the algorithm, we
evaluate the diversity of the communities found by Fluid Communities, and find
them to be significantly different from the ones found by alternative methods.Comment: Accepted at the 6th International Conference on Complex Networks and
Their Application
- …