3,450 research outputs found
GeantV: Results from the prototype of concurrent vector particle transport simulation in HEP
Full detector simulation was among the largest CPU consumer in all CERN
experiment software stacks for the first two runs of the Large Hadron Collider
(LHC). In the early 2010's, the projections were that simulation demands would
scale linearly with luminosity increase, compensated only partially by an
increase of computing resources. The extension of fast simulation approaches to
more use cases, covering a larger fraction of the simulation budget, is only
part of the solution due to intrinsic precision limitations. The remainder
corresponds to speeding-up the simulation software by several factors, which is
out of reach using simple optimizations on the current code base. In this
context, the GeantV R&D project was launched, aiming to redesign the legacy
particle transport codes in order to make them benefit from fine-grained
parallelism features such as vectorization, but also from increased code and
data locality. This paper presents extensively the results and achievements of
this R&D, as well as the conclusions and lessons learnt from the beta
prototype.Comment: 34 pages, 26 figures, 24 table
The use of primitives in the calculation of radiative view factors
Compilations of radiative view factors (often in closed analytical form) are readily available in the open literature for commonly encountered geometries. For more complex three-dimensional (3D) scenarios, however, the effort required to solve the requisite multi-dimensional integrations needed to estimate a required view factor can be daunting to say the least. In such cases, a combination of finite element methods (where the geometry in question is sub-divided into a large number of uniform, often triangular, elements) and Monte Carlo Ray Tracing (MC-RT) has been developed, although frequently the software implementation is suitable only for a limited set of geometrical scenarios. Driven initially by a need to calculate the radiative heat transfer occurring within an operational fibre-drawing furnace, this research set out to examine options whereby MC-RT could be used to cost-effectively calculate any generic 3D radiative view factor using current vectorisation technologies
Pseudo-random number generators for Monte Carlo simulations on Graphics Processing Units
Basic uniform pseudo-random number generators are implemented on ATI Graphics
Processing Units (GPU). The performance results of the realized generators
(multiplicative linear congruential (GGL), XOR-shift (XOR128), RANECU, RANMAR,
RANLUX and Mersenne Twister (MT19937)) on CPU and GPU are discussed. The
obtained speed-up factor is hundreds of times in comparison with CPU. RANLUX
generator is found to be the most appropriate for using on GPU in Monte Carlo
simulations. The brief review of the pseudo-random number generators used in
modern software packages for Monte Carlo simulations in high-energy physics is
present.Comment: 31 pages, 9 figures, 3 table
Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision
Modern graphics processing units (GPUs) provide impressive computing
resources, which can be accessed conveniently through the CUDA programming
interface. We describe how GPUs can be used to considerably speed up molecular
dynamics (MD) simulations for system sizes ranging up to about 1 million
particles. Particular emphasis is put on the numerical long-time stability in
terms of energy and momentum conservation, and caveats on limited
floating-point precision are issued. Strict energy conservation over 10^8 MD
steps is obtained by double-single emulation of the floating-point arithmetic
in accuracy-critical parts of the algorithm. For the slow dynamics of a
supercooled binary Lennard-Jones mixture, we demonstrate that the use of
single-floating point precision may result in quantitatively and even
physically wrong results. For simulations of a Lennard-Jones fluid, the
described implementation shows speedup factors of up to 80 compared to a serial
implementation for the CPU, and a single GPU was found to compare with a
parallelised MD simulation using 64 distributed cores.Comment: 12 pages, 7 figures, to appear in Comp. Phys. Comm., HALMD package
licensed under the GPL, see http://research.colberg.org/projects/halm
Parallel resampling in the particle filter
Modern parallel computing devices, such as the graphics processing unit
(GPU), have gained significant traction in scientific and statistical
computing. They are particularly well-suited to data-parallel algorithms such
as the particle filter, or more generally Sequential Monte Carlo (SMC), which
are increasingly used in statistical inference. SMC methods carry a set of
weighted particles through repeated propagation, weighting and resampling
steps. The propagation and weighting steps are straightforward to parallelise,
as they require only independent operations on each particle. The resampling
step is more difficult, as standard schemes require a collective operation,
such as a sum, across particle weights. Focusing on this resampling step, we
analyse two alternative schemes that do not involve a collective operation
(Metropolis and rejection resamplers), and compare them to standard schemes
(multinomial, stratified and systematic resamplers). We find that, in certain
circumstances, the alternative resamplers can perform significantly faster on a
GPU, and to a lesser extent on a CPU, than the standard approaches. Moreover,
in single precision, the standard approaches are numerically biased for upwards
of hundreds of thousands of particles, while the alternatives are not. This is
particularly important given greater single- than double-precision throughput
on modern devices, and the consequent temptation to use single precision with a
greater number of particles. Finally, we provide auxiliary functions useful for
implementation, such as for the permutation of ancestry vectors to enable
in-place propagation.Comment: 21 pages, 6 figure
- âŠ