984 research outputs found
6th and 8th Order Hermite Integrator for N-body Simulations
We present sixth- and eighth-order Hermite integrators for astrophysical
-body simulations, which use the derivatives of accelerations up to second
order ({\it snap}) and third order ({\it crackle}). These schemes do not
require previous values for the corrector, and require only one previous value
to construct the predictor. Thus, they are fairly easy to implemente. The
additional cost of the calculation of the higher order derivatives is not very
high. Even for the eighth-order scheme, the number of floating-point operations
for force calculation is only about two times larger than that for traditional
fourth-order Hermite scheme. The sixth order scheme is better than the
traditional fourth order scheme for most cases. When the required accuracy is
very high, the eighth-order one is the best. These high-order schemes have
several practical advantages. For example, they allow a larger number of
particles to be integrated in parallel than the fourth-order scheme does,
resulting in higher execution efficiency in both general-purpose parallel
computers and GRAPE systems.Comment: 21 pages, 6 figures, New Astronomy accepte
4.45 Pflops Astrophysical N-Body Simulation on K computer -- The Gravitational Trillion-Body Problem
As an entry for the 2012 Gordon-Bell performance prize, we report performance
results of astrophysical N-body simulations of one trillion particles performed
on the full system of K computer. This is the first gravitational trillion-body
simulation in the world. We describe the scientific motivation, the numerical
algorithm, the parallelization strategy, and the performance analysis. Unlike
many previous Gordon-Bell prize winners that used the tree algorithm for
astrophysical N-body simulations, we used the hybrid TreePM method, for similar
level of accuracy in which the short-range force is calculated by the tree
algorithm, and the long-range force is solved by the particle-mesh algorithm.
We developed a highly-tuned gravity kernel for short-range forces, and a novel
communication algorithm for long-range forces. The average performance on 24576
and 82944 nodes of K computer are 1.53 and 4.45 Pflops, which correspond to 49%
and 42% of the peak speed.Comment: 10 pages, 6 figures, Proceedings of Supercomputing 2012
(http://sc12.supercomputing.org/), Gordon Bell Prize Winner. Additional
information is http://www.ccs.tsukuba.ac.jp/CCS/eng/gbp201
N-body simulation for self-gravitating collisional systems with a new SIMD instruction set extension to the x86 architecture, Advanced Vector eXtensions
We present a high-performance N-body code for self-gravitating collisional
systems accelerated with the aid of a new SIMD instruction set extension of the
x86 architecture: Advanced Vector eXtensions (AVX), an enhanced version of the
Streaming SIMD Extensions (SSE). With one processor core of Intel Core i7-2600
processor (8 MB cache and 3.40 GHz) based on Sandy Bridge micro-architecture,
we implemented a fourth-order Hermite scheme with individual timestep scheme
(Makino and Aarseth, 1992), and achieved the performance of 20 giga floating
point number operations per second (GFLOPS) for double-precision accuracy,
which is two times and five times higher than that of the previously developed
code implemented with the SSE instructions (Nitadori et al., 2006b), and that
of a code implemented without any explicit use of SIMD instructions with the
same processor core, respectively. We have parallelized the code by using
so-called NINJA scheme (Nitadori et al., 2006a), and achieved 90 GFLOPS for a
system containing more than N = 8192 particles with 8 MPI processes on four
cores. We expect to achieve about 10 tera FLOPS (TFLOPS) for a self-gravitating
collisional system with N 105 on massively parallel systems with at most 800
cores with Sandy Bridge micro-architecture. This performance will be comparable
to that of Graphic Processing Unit (GPU) cluster systems, such as the one with
about 200 Tesla C1070 GPUs (Spurzem et al., 2010). This paper offers an
alternative to collisional N-body simulations with GRAPEs and GPUs.Comment: 14 pages, 9 figures, 3 tables, accepted for publication in New
Astronomy. The code is publicly available at
http://code.google.com/p/phantom-grape
Accelerating NBODY6 with Graphics Processing Units
We describe the use of Graphics Processing Units (GPUs) for speeding up the
code NBODY6 which is widely used for direct -body simulations. Over the
years, the nature of the direct force calculation has proved a barrier
for extending the particle number. Following an early introduction of force
polynomials and individual time-steps, the calculation cost was first reduced
by the introduction of a neighbour scheme. After a decade of GRAPE computers
which speeded up the force calculation further, we are now in the era of GPUs
where relatively small hardware systems are highly cost-effective. A
significant gain in efficiency is achieved by employing the GPU to obtain the
so-called regular force which typically involves some 99 percent of the
particles, while the remaining local forces are evaluated on the host. However,
the latter operation is performed up to 20 times more frequently and may still
account for a significant cost. This effort is reduced by parallel SSE/AVX
procedures where each interaction term is calculated using mainly single
precision. We also discuss further strategies connected with coordinate and
velocity prediction required by the integration scheme. This leaves hard
binaries and multiple close encounters which are treated by several
regularization methods. The present nbody6-GPU code is well balanced for
simulations in the particle range for a dual GPU system
attached to a standard PC.Comment: 8 pages, 3 figures, 2 tables, MNRAS accepte
Performance Tuning of N-Body Codes on Modern Microprocessors: I. Direct Integration with a Hermite Scheme on x86_64 Architecture
The main performance bottleneck of gravitational N-body codes is the force
calculation between two particles. We have succeeded in speeding up this
pair-wise force calculation by factors between two and ten, depending on the
code and the processor on which the code is run. These speedups were obtained
by writing highly fine-tuned code for x86_64 microprocessors. Any existing
N-body code, running on these chips, can easily incorporate our assembly code
programs.
In the current paper, we present an outline of our overall approach, which we
illustrate with one specific example: the use of a Hermite scheme for a direct
N^2 type integration on a single 2.0 GHz Athlon 64 processor, for which we
obtain an effective performance of 4.05 Gflops, for double precision accuracy.
In subsequent papers, we will discuss other variations, including the
combinations of N log N codes, single precision implementations, and
performance on other microprocessors.Comment: 32 pages, 2 figure
Simulating the universe on an intercontinental grid of supercomputers
Understanding the universe is hampered by the elusiveness of its most common
constituent, cold dark matter. Almost impossible to observe, dark matter can be
studied effectively by means of simulation and there is probably no other
research field where simulation has led to so much progress in the last decade.
Cosmological N-body simulations are an essential tool for evolving density
perturbations in the nonlinear regime. Simulating the formation of large-scale
structures in the universe, however, is still a challenge due to the enormous
dynamic range in spatial and temporal coordinates, and due to the enormous
computer resources required. The dynamic range is generally dealt with by the
hybridization of numerical techniques. We deal with the computational
requirements by connecting two supercomputers via an optical network and make
them operate as a single machine. This is challenging, if only for the fact
that the supercomputers of our choice are separated by half the planet, as one
is located in Amsterdam and the other is in Tokyo. The co-scheduling of the two
computers and the 'gridification' of the code enables us to achieve a 90%
efficiency for this distributed intercontinental supercomputer.Comment: Accepted for publication in IEEE Compute
Utility of Comprehensive Genomic Profiling for Precise Diagnosis of Pediatric-Type Diffuse High-Grade Glioma
In the current World Health Organization classification of central nervous system tumors, comprehensive genetic and epigenetic analyses are considered essential for precise diagnosis. A 14-year-old male patient who presented with a cerebellar tumor was initially diagnosed with glioblastoma and treated with radiation and concomitant temozolomide chemotherapy after resection. During maintenance temozolomide therapy, a new contrast-enhanced lesion developed in the bottom of the cavity formed by the resection. A second surgery was performed, but the histological findings in specimens from the second surgery were different from those of the first surgery. Although genome-wide DNA methylation profiling was conducted using frozen tissue for a precise diagnosis, the proportion of tumor cells was insufficient and only normal cerebellum was observed. We then performed comprehensive genetic analysis using formalin-fixed paraffin-embedded sections, which revealed MYCN amplification without alteration of IDH1, IDH2, or Histone H3. Finally, the patient was diagnosed with pediatric-type diffuse high-grade glioma, H3-wildtype and IDH-wildtype. In conclusion, comprehensive genetic and epigenetic analysis should be considered in pediatric brain tumor cases
- …