1,008 research outputs found
MPWide: a light-weight library for efficient message passing over wide area networks
We present MPWide, a light weight communication library which allows
efficient message passing over a distributed network. MPWide has been designed
to connect application running on distributed (super)computing resources, and
to maximize the communication performance on wide area networks for those
without administrative privileges. It can be used to provide message-passing
between application, move files, and make very fast connections in
client-server environments. MPWide has already been applied to enable
distributed cosmological simulations across up to four supercomputers on two
continents, and to couple two different bloodflow simulations to form a
multiscale simulation.Comment: accepted by the Journal Of Open Research Software, 13 pages, 4
figures, 1 tabl
Analyzing and Modeling the Performance of the HemeLB Lattice-Boltzmann Simulation Environment
We investigate the performance of the HemeLB lattice-Boltzmann simulator for
cerebrovascular blood flow, aimed at providing timely and clinically relevant
assistance to neurosurgeons. HemeLB is optimised for sparse geometries,
supports interactive use, and scales well to 32,768 cores for problems with ~81
million lattice sites. We obtain a maximum performance of 29.5 billion site
updates per second, with only an 11% slowdown for highly sparse problems (5%
fluid fraction). We present steering and visualisation performance measurements
and provide a model which allows users to predict the performance, thereby
determining how to run simulations with maximum accuracy within time
constraints.Comment: Accepted by the Journal of Computational Science. 33 pages, 16
figures, 7 table
Sensitivity Analysis of High-Dimensional Models with Correlated Inputs
Sensitivity analysis is an important tool used in many domains of
computational science to either gain insight into the mathematical model and
interaction of its parameters or study the uncertainty propagation through the
input-output interactions. In many applications, the inputs are stochastically
dependent, which violates one of the essential assumptions in the
state-of-the-art sensitivity analysis methods. Consequently, the results
obtained ignoring the correlations provide values which do not reflect the true
contributions of the input parameters. This study proposes an approach to
address the parameter correlations using a polynomial chaos expansion method
and Rosenblatt and Cholesky transformations to reflect the parameter
dependencies. Treatment of the correlated variables is discussed in context of
variance and derivative-based sensitivity analysis. We demonstrate that the
sensitivity of the correlated parameters can not only differ in magnitude, but
even the sign of the derivative-based index can be inverted, thus significantly
altering the model behavior compared to the prediction of the analysis
disregarding the correlations. Numerous experiments are conducted using
workflow automation tools within the VECMA toolkit
The Living Application: a Self-Organising System for Complex Grid Tasks
We present the living application, a method to autonomously manage
applications on the grid. During its execution on the grid, the living
application makes choices on the resources to use in order to complete its
tasks. These choices can be based on the internal state, or on autonomously
acquired knowledge from external sensors. By giving limited user capabilities
to a living application, the living application is able to port itself from one
resource topology to another. The application performs these actions at
run-time without depending on users or external workflow tools. We demonstrate
this new concept in a special case of a living application: the living
simulation. Today, many simulations require a wide range of numerical solvers
and run most efficiently if specialized nodes are matched to the solvers. The
idea of the living simulation is that it decides itself which grid machines to
use based on the numerical solver currently in use. In this paper we apply the
living simulation to modelling the collision between two galaxies in a test
setup with two specialized computers. This simulation switces at run-time
between a GPU-enabled computer in the Netherlands and a GRAPE-enabled machine
that resides in the United States, using an oct-tree N-body code whenever it
runs in the Netherlands and a direct N-body solver in the United States.Comment: 26 pages, 3 figures, accepted by IJHPC
Simulating the universe on an intercontinental grid of supercomputers
Understanding the universe is hampered by the elusiveness of its most common
constituent, cold dark matter. Almost impossible to observe, dark matter can be
studied effectively by means of simulation and there is probably no other
research field where simulation has led to so much progress in the last decade.
Cosmological N-body simulations are an essential tool for evolving density
perturbations in the nonlinear regime. Simulating the formation of large-scale
structures in the universe, however, is still a challenge due to the enormous
dynamic range in spatial and temporal coordinates, and due to the enormous
computer resources required. The dynamic range is generally dealt with by the
hybridization of numerical techniques. We deal with the computational
requirements by connecting two supercomputers via an optical network and make
them operate as a single machine. This is challenging, if only for the fact
that the supercomputers of our choice are separated by half the planet, as one
is located in Amsterdam and the other is in Tokyo. The co-scheduling of the two
computers and the 'gridification' of the code enables us to achieve a 90%
efficiency for this distributed intercontinental supercomputer.Comment: Accepted for publication in IEEE Compute
A parallel gravitational N-body kernel
We describe source code level parallelization for the {\tt kira} direct
gravitational -body integrator, the workhorse of the {\tt starlab}
production environment for simulating dense stellar systems. The
parallelization strategy, called ``j-parallelization'', involves the partition
of the computational domain by distributing all particles in the system among
the available processors. Partial forces on the particles to be advanced are
calculated in parallel by their parent processors, and are then summed in a
final global operation. Once total forces are obtained, the computing elements
proceed to the computation of their particle trajectories. We report the
results of timing measurements on four different parallel computers, and
compare them with theoretical predictions. The computers employ either a
high-speed interconnect, a NUMA architecture to minimize the communication
overhead or are distributed in a grid. The code scales well in the domain
tested, which ranges from 1024 - 65536 stars on 1 - 128 processors, providing
satisfactory speedup. Running the production environment on a grid becomes
inefficient for more than 60 processors distributed across three sites.Comment: 21 pages, New Astronomy (in press
- …