30,597 research outputs found
A pilgrimage to gravity on GPUs
In this short review we present the developments over the last 5 decades that
have led to the use of Graphics Processing Units (GPUs) for astrophysical
simulations. Since the introduction of NVIDIA's Compute Unified Device
Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body
simulations and is so popular these days that almost all papers about high
precision N-body simulations use methods that are accelerated by GPUs. With the
GPU hardware becoming more advanced and being used for more advanced algorithms
like gravitational tree-codes we see a bright future for GPU like hardware in
computational astrophysics.Comment: To appear in: European Physical Journal "Special Topics" : "Computer
Simulations on Graphics Processing Units" . 18 pages, 8 figure
Performance analysis of parallel gravitational -body codes on large GPU cluster
We compare the performance of two very different parallel gravitational
-body codes for astrophysical simulations on large GPU clusters, both
pioneer in their own fields as well as in certain mutual scales - NBODY6++ and
Bonsai. We carry out the benchmark of the two codes by analyzing their
performance, accuracy and efficiency through the modeling of structure
decomposition and timing measurements. We find that both codes are heavily
optimized to leverage the computational potential of GPUs as their performance
has approached half of the maximum single precision performance of the
underlying GPU cards. With such performance we predict that a speed-up of
can be achieved when up to 1k processors and GPUs are employed
simultaneously. We discuss the quantitative information about comparisons of
two codes, finding that in the same cases Bonsai adopts larger time steps as
well as relative energy errors than NBODY6++, typically ranging from
times larger, depending on the chosen parameters of the codes. While the two
codes are built for different astrophysical applications, in specified
conditions they may overlap in performance at certain physical scale, and thus
allowing the user to choose from either one with finetuned parameters
accordingly.Comment: 15 pages, 7 figures, 3 tables, accepted for publication in Research
in Astronomy and Astrophysics (RAA
Janus II: a new generation application-driven computer for spin-system simulations
This paper describes the architecture, the development and the implementation
of Janus II, a new generation application-driven number cruncher optimized for
Monte Carlo simulations of spin systems (mainly spin glasses). This domain of
computational physics is a recognized grand challenge of high-performance
computing: the resources necessary to study in detail theoretical models that
can make contact with experimental data are by far beyond those available using
commodity computer systems. On the other hand, several specific features of the
associated algorithms suggest that unconventional computer architectures, which
can be implemented with available electronics technologies, may lead to order
of magnitude increases in performance, reducing to acceptable values on human
scales the time needed to carry out simulation campaigns that would take
centuries on commercially available machines. Janus II is one such machine,
recently developed and commissioned, that builds upon and improves on the
successful JANUS machine, which has been used for physics since 2008 and is
still in operation today. This paper describes in detail the motivations behind
the project, the computational requirements, the architecture and the
implementation of this new machine and compares its expected performances with
those of currently available commercial systems.Comment: 28 pages, 6 figure
Explorations of the viability of ARM and Xeon Phi for physics processing
We report on our investigations into the viability of the ARM processor and
the Intel Xeon Phi co-processor for scientific computing. We describe our
experience porting software to these processors and running benchmarks using
real physics applications to explore the potential of these processors for
production physics processing.Comment: Submitted to proceedings of the 20th International Conference on
Computing in High Energy and Nuclear Physics (CHEP13), Amsterda
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
Multiscale and inhomogeneous molecular systems are challenging topics in the
field of molecular simulation. In particular, modeling biological systems in
the context of multiscale simulations and exploring material properties are
driving a permanent development of new simulation methods and optimization
algorithms. In computational terms, those methods require parallelization
schemes that make a productive use of computational resources for each
simulation and from its genesis. Here, we introduce the heterogeneous domain
decomposition approach which is a combination of an heterogeneity sensitive
spatial domain decomposition with an \textit{a priori} rearrangement of
subdomain-walls. Within this approach, the theoretical modeling and
scaling-laws for the force computation time are proposed and studied as a
function of the number of particles and the spatial resolution ratio. We also
show the new approach capabilities, by comparing it to both static domain
decomposition algorithms and dynamic load balancing schemes. Specifically, two
representative molecular systems have been simulated and compared to the
heterogeneous domain decomposition proposed in this work. These two systems
comprise an adaptive resolution simulation of a biomolecule solvated in water
and a phase separated binary Lennard-Jones fluid.Comment: 14 pages, 12 figure
Ianus: an Adpative FPGA Computer
Dedicated machines designed for specific computational algorithms can
outperform conventional computers by several orders of magnitude. In this note
we describe {\it Ianus}, a new generation FPGA based machine and its basic
features: hardware integration and wide reprogrammability. Our goal is to build
a machine that can fully exploit the performance potential of new generation
FPGA devices. We also plan a software platform which simplifies its
programming, in order to extend its intended range of application to a wide
class of interesting and computationally demanding problems. The decision to
develop a dedicated processor is a complex one, involving careful assessment of
its performance lead, during its expected lifetime, over traditional computers,
taking into account their performance increase, as predicted by Moore's law. We
discuss this point in detail
Probabilistic structural mechanics research for parallel processing computers
Aerospace structures and spacecraft are a complex assemblage of structural components that are subjected to a variety of complex, cyclic, and transient loading conditions. Significant modeling uncertainties are present in these structures, in addition to the inherent randomness of material properties and loads. To properly account for these uncertainties in evaluating and assessing the reliability of these components and structures, probabilistic structural mechanics (PSM) procedures must be used. Much research has focused on basic theory development and the development of approximate analytic solution methods in random vibrations and structural reliability. Practical application of PSM methods was hampered by their computationally intense nature. Solution of PSM problems requires repeated analyses of structures that are often large, and exhibit nonlinear and/or dynamic response behavior. These methods are all inherently parallel and ideally suited to implementation on parallel processing computers. New hardware architectures and innovative control software and solution methodologies are needed to make solution of large scale PSM problems practical
- …