10,247 research outputs found
A Parallel Monte Carlo Code for Simulating Collisional N-body Systems
We present a new parallel code for computing the dynamical evolution of
collisional N-body systems with up to N~10^7 particles. Our code is based on
the the Henon Monte Carlo method for solving the Fokker-Planck equation, and
makes assumptions of spherical symmetry and dynamical equilibrium. The
principal algorithmic developments involve optimizing data structures, and the
introduction of a parallel random number generation scheme, as well as a
parallel sorting algorithm, required to find nearest neighbors for interactions
and to compute the gravitational potential. The new algorithms we introduce
along with our choice of decomposition scheme minimize communication costs and
ensure optimal distribution of data and workload among the processing units.
The implementation uses the Message Passing Interface (MPI) library for
communication, which makes it portable to many different supercomputing
architectures. We validate the code by calculating the evolution of clusters
with initial Plummer distribution functions up to core collapse with the number
of stars, N, spanning three orders of magnitude, from 10^5 to 10^7. We find
that our results are in good agreement with self-similar core-collapse
solutions, and the core collapse times generally agree with expectations from
the literature. Also, we observe good total energy conservation, within less
than 0.04% throughout all simulations. We analyze the performance of the code,
and demonstrate near-linear scaling of the runtime with the number of
processors up to 64 processors for N=10^5, 128 for N=10^6 and 256 for N=10^7.
The runtime reaches a saturation with the addition of more processors beyond
these limits which is a characteristic of the parallel sorting algorithm. The
resulting maximum speedups we achieve are approximately 60x, 100x, and 220x,
respectively.Comment: 53 pages, 13 figures, accepted for publication in ApJ Supplement
Reducing phase error in long numerical binary black hole evolutions with sixth order finite differencing
We describe a modification of a fourth-order accurate ``moving puncture''
evolution code, where by replacing spatial fourth-order accurate differencing
operators in the bulk of the grid by a specific choice of sixth-order accurate
stencils we gain significant improvements in accuracy. We illustrate the
performance of the modified algorithm with an equal-mass simulation covering
nine orbits.Comment: 13 pages, 6 figure
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Starting from a high-level problem description in terms of partial
differential equations using abstract tensor notation, the Chemora framework
discretizes, optimizes, and generates complete high performance codes for a
wide range of compute architectures. Chemora extends the capabilities of
Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient
manner for complex applications, without low-level code tuning. Chemora
achieves parallelism through MPI and multi-threading, combining OpenMP and
CUDA. Optimizations include high-level code transformations, efficient loop
traversal strategies, dynamically selected data and instruction cache usage
strategies, and JIT compilation of GPU code tailored to the problem
characteristics. The discretization is based on higher-order finite differences
on multi-block domains. Chemora's capabilities are demonstrated by simulations
of black hole collisions. This problem provides an acid test of the framework,
as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific
Programmin
Searching for periodic sources with LIGO. II: Hierarchical searches
The detection of quasi-periodic sources of gravitational waves requires the
accumulation of signal-to-noise over long observation times. If not removed,
Earth-motion induced Doppler modulations, and intrinsic variations of the
gravitational-wave frequency make the signals impossible to detect. These
effects can be corrected (removed) using a parameterized model for the
frequency evolution. We compute the number of independent corrections
required for incoherent search strategies which use stacked
power spectra---a demodulated time series is divided into segments of
length , each segment is FFTed, the power is computed, and the
spectra are summed up. We estimate that the sensitivity of an all-sky search
that uses incoherent stacks is a factor of 2--4 better than would be achieved
using coherent Fourier transforms; incoherent methods are computationally
efficient at exploring large parameter spaces. A two-stage hierarchical search
which yields another 20--60% improvement in sensitivity in all-sky searches for
old (>= 1000 yr) slow (= 40 yr) fast (<=
1000 Hz) pulsars. Assuming 10^{12} flops of effective computing power for data
analysis, enhanced LIGO interferometers should be sensitive to: (i) Galactic
core pulsars with gravitational ellipticities of \epsilon\agt5\times 10^{-6}
at 200 Hz, (ii) Gravitational waves emitted by the unstable r-modes of newborn
neutron stars out to distances of ~8 Mpc, and (iii) neutron stars in LMXB's
with x-ray fluxes which exceed . Moreover,
gravitational waves from the neutron star in Sco X-1 should be detectable is
the interferometer is operated in a signal-recycled, narrow-band configuration.Comment: 22 Pages, 13 Figure
Coherent Bayesian inference on compact binary inspirals using a network of interferometric gravitational wave detectors
Presented in this paper is a Markov chain Monte Carlo (MCMC) routine for
conducting coherent parameter estimation for interferometric gravitational wave
observations of an inspiral of binary compact objects using data from multiple
detectors. The MCMC technique uses data from several interferometers and infers
all nine of the parameters (ignoring spin) associated with the binary system,
including the distance to the source, the masses, and the location on the sky.
The Metropolis-algorithm utilises advanced MCMC techniques, such as importance
resampling and parallel tempering. The data is compared with time-domain
inspiral templates that are 2.5 post-Newtonian (PN) in phase and 2.0 PN in
amplitude. Our routine could be implemented as part of an inspiral detection
pipeline for a world wide network of detectors. Examples are given for
simulated signals and data as seen by the LIGO and Virgo detectors operating at
their design sensitivity.Comment: 10 pages, 4 figure
- …