854 research outputs found
A parallel nearly implicit time-stepping scheme
Across-the-space parallelism still remains the most mature, convenient and natural way to parallelize large scale problems. One of the major problems here is that implicit time stepping is often difficult to parallelize due to the structure of the system. Approximate implicit schemes have been suggested to circumvent the problem. These schemes have attractive stability properties and they are also very well parallelizable.\ud
The purpose of this article is to give an overall assessment of the parallelism of the method
Scalability Analysis of Parallel GMRES Implementations
Applications involving large sparse nonsymmetric linear systems encourage parallel implementations of robust iterative solution methods, such as GMRES(k). Two parallel versions of GMRES(k) based on different data distributions and using Householder reflections in the orthogonalization phase, and variations of these which adapt the restart value k, are analyzed with respect to scalability (their ability to maintain fixed efficiency with an increase in problem size and number of processors).A theoretical algorithm-machine model for scalability is derived and validated by experiments on three parallel computers, each with different machine characteristics
A modified parallel tree code for N-body simulation of the Large Scale Structure of the Universe
N-body codes to perform simulations of the origin and evolution of the Large
Scale Structure of the Universe have improved significantly over the past
decade both in terms of the resolution achieved and of reduction of the CPU
time. However, state-of-the-art N-body codes hardly allow one to deal with
particle numbers larger than a few 10^7, even on the largest parallel systems.
In order to allow simulations with larger resolution, we have first
re-considered the grouping strategy as described in Barnes (1990) (hereafter
B90) and applied it with some modifications to our WDSH-PT (Work and Data
SHaring - Parallel Tree) code. In the first part of this paper we will give a
short description of the code adopting the Barnes and Hut algorithm
\cite{barh86} (hereafter BH), and in particular of the memory and work
distribution strategy applied to describe the {\it data distribution} on a
CC-NUMA machine like the CRAY-T3E system. In the second part of the paper we
describe the modification to the Barnes grouping strategy we have devised to
improve the performance of the WDSH-PT code. We will use the property that
nearby particles have similar interaction list. This idea has been checked in
B90, where an interaction list is builded which applies everywhere within a
cell C_{group} containing a little number of particles N_{crit}. B90 reuses
this interaction list for each particle in the cell in turn.
We will assume each particle p to have the same interaction list.
Thus it has been possible to reduce the CPU time increasing the performances.
This leads us to run simulations with a large number of particles (N ~
10^7/10^9) in non-prohibitive times.Comment: 13 pages and 7 Figure
Experiments with MRAI time stepping schemes on a distributed memory parallel environment
Implicit time stepping is often difficult to parallelize. The recently proposed Minimal Residual Approximate Implicit (MRAI) schemes are specially designed as a cheaper and parallelizable alternative for implicit time stepping. A several GMRES iterations are performed to solve approximately the implicit scheme of interest, and the step size is adjusted to guarantee stability.
A natural way to apply the approach is to modify a given implicit scheme in which one is interested. Here, we present numerical results for two parallel implementations of MRAI schemes. One is based on the simple Euler Backward scheme, and the other is the MRAI-modified multistep ODE solver LSODE.
On the Cray T3E and IBM SP2 platforms, the MRAI codes exhibit parallelism of explicit schemes. The model problem under consideration is the 3D spatially discretized heat equation. The speed-up results for the Cray T3E and IBM SP2 are reported
Ludwig: A parallel Lattice-Boltzmann code for complex fluids
This paper describes `Ludwig', a versatile code for the simulation of
Lattice-Boltzmann (LB) models in 3-D on cubic lattices. In fact `Ludwig' is not
a single code, but a set of codes that share certain common routines, such as
I/O and communications. If `Ludwig' is used as intended, a variety of complex
fluid models with different equilibrium free energies are simple to code, so
that the user may concentrate on the physics of the problem, rather than on
parallel computing issues. Thus far, `Ludwig''s main application has been to
symmetric binary fluid mixtures. We first explain the philosophy and structure
of `Ludwig' which is argued to be a very effective way of developing large
codes for academic consortia. Next we elaborate on some parallel implementation
issues such as parallel I/O, and the use of MPI to achieve full portability and
good efficiency on both MPP and SMP systems. Finally, we describe how to
implement generic solid boundaries, and look in detail at the particular case
of a symmetric binary fluid mixture near a solid wall. We present a novel
scheme for the thermodynamically consistent simulation of wetting phenomena, in
the presence of static and moving solid boundaries, and check its performance.Comment: Submitted to Computer Physics Communication
A general analytical model of adaptive wormhole routing in k-ary n-cubes
Several analytical models of fully adaptive routing have recently been proposed for k-ary n-cubes and hypercube networks under the uniform traffic pattern. Although,hypercube is a special case of k-ary n-cubes topology, the modeling approach for hypercube is more accurate than karyn-cubes due to its simpler structure. This paper proposes a general analytical model to predict message latency in wormhole-routed k-ary n-cubes with fully adaptive routing that uses a similar modeling approach to hypercube. The analysis focuses Duato's fully adaptive routing algorithm [12], which is widely accepted as the most general algorithm for achieving adaptivity in wormhole-routed networks while allowing for an efficient router implementation. The proposed model is general enough that it can be used for hypercube and other fully adaptive routing algorithms
Numerical Relativity As A Tool For Computational Astrophysics
The astrophysics of compact objects, which requires Einstein's theory of
general relativity for understanding phenomena such as black holes and neutron
stars, is attracting increasing attention. In general relativity, gravity is
governed by an extremely complex set of coupled, nonlinear, hyperbolic-elliptic
partial differential equations. The largest parallel supercomputers are finally
approaching the speed and memory required to solve the complete set of
Einstein's equations for the first time since they were written over 80 years
ago, allowing one to attempt full 3D simulations of such exciting events as
colliding black holes and neutron stars. In this paper we review the
computational effort in this direction, and discuss a new 3D multi-purpose
parallel code called ``Cactus'' for general relativistic astrophysics.
Directions for further work are indicated where appropriate.Comment: Review for JCA
- âŚ