854 research outputs found

    A parallel nearly implicit time-stepping scheme

    Get PDF
    Across-the-space parallelism still remains the most mature, convenient and natural way to parallelize large scale problems. One of the major problems here is that implicit time stepping is often difficult to parallelize due to the structure of the system. Approximate implicit schemes have been suggested to circumvent the problem. These schemes have attractive stability properties and they are also very well parallelizable.\ud The purpose of this article is to give an overall assessment of the parallelism of the method

    Scalability Analysis of Parallel GMRES Implementations

    Get PDF
    Applications involving large sparse nonsymmetric linear systems encourage parallel implementations of robust iterative solution methods, such as GMRES(k). Two parallel versions of GMRES(k) based on different data distributions and using Householder reflections in the orthogonalization phase, and variations of these which adapt the restart value k, are analyzed with respect to scalability (their ability to maintain fixed efficiency with an increase in problem size and number of processors).A theoretical algorithm-machine model for scalability is derived and validated by experiments on three parallel computers, each with different machine characteristics

    A modified parallel tree code for N-body simulation of the Large Scale Structure of the Universe

    Full text link
    N-body codes to perform simulations of the origin and evolution of the Large Scale Structure of the Universe have improved significantly over the past decade both in terms of the resolution achieved and of reduction of the CPU time. However, state-of-the-art N-body codes hardly allow one to deal with particle numbers larger than a few 10^7, even on the largest parallel systems. In order to allow simulations with larger resolution, we have first re-considered the grouping strategy as described in Barnes (1990) (hereafter B90) and applied it with some modifications to our WDSH-PT (Work and Data SHaring - Parallel Tree) code. In the first part of this paper we will give a short description of the code adopting the Barnes and Hut algorithm \cite{barh86} (hereafter BH), and in particular of the memory and work distribution strategy applied to describe the {\it data distribution} on a CC-NUMA machine like the CRAY-T3E system. In the second part of the paper we describe the modification to the Barnes grouping strategy we have devised to improve the performance of the WDSH-PT code. We will use the property that nearby particles have similar interaction list. This idea has been checked in B90, where an interaction list is builded which applies everywhere within a cell C_{group} containing a little number of particles N_{crit}. B90 reuses this interaction list for each particle p∈Cgroup p \in C_{group} in the cell in turn. We will assume each particle p to have the same interaction list. Thus it has been possible to reduce the CPU time increasing the performances. This leads us to run simulations with a large number of particles (N ~ 10^7/10^9) in non-prohibitive times.Comment: 13 pages and 7 Figure

    Experiments with MRAI time stepping schemes on a distributed memory parallel environment

    Get PDF
    Implicit time stepping is often difficult to parallelize. The recently proposed Minimal Residual Approximate Implicit (MRAI) schemes are specially designed as a cheaper and parallelizable alternative for implicit time stepping. A several GMRES iterations are performed to solve approximately the implicit scheme of interest, and the step size is adjusted to guarantee stability. A natural way to apply the approach is to modify a given implicit scheme in which one is interested. Here, we present numerical results for two parallel implementations of MRAI schemes. One is based on the simple Euler Backward scheme, and the other is the MRAI-modified multistep ODE solver LSODE. On the Cray T3E and IBM SP2 platforms, the MRAI codes exhibit parallelism of explicit schemes. The model problem under consideration is the 3D spatially discretized heat equation. The speed-up results for the Cray T3E and IBM SP2 are reported

    Ludwig: A parallel Lattice-Boltzmann code for complex fluids

    Full text link
    This paper describes `Ludwig', a versatile code for the simulation of Lattice-Boltzmann (LB) models in 3-D on cubic lattices. In fact `Ludwig' is not a single code, but a set of codes that share certain common routines, such as I/O and communications. If `Ludwig' is used as intended, a variety of complex fluid models with different equilibrium free energies are simple to code, so that the user may concentrate on the physics of the problem, rather than on parallel computing issues. Thus far, `Ludwig''s main application has been to symmetric binary fluid mixtures. We first explain the philosophy and structure of `Ludwig' which is argued to be a very effective way of developing large codes for academic consortia. Next we elaborate on some parallel implementation issues such as parallel I/O, and the use of MPI to achieve full portability and good efficiency on both MPP and SMP systems. Finally, we describe how to implement generic solid boundaries, and look in detail at the particular case of a symmetric binary fluid mixture near a solid wall. We present a novel scheme for the thermodynamically consistent simulation of wetting phenomena, in the presence of static and moving solid boundaries, and check its performance.Comment: Submitted to Computer Physics Communication

    A general analytical model of adaptive wormhole routing in k-ary n-cubes

    Get PDF
    Several analytical models of fully adaptive routing have recently been proposed for k-ary n-cubes and hypercube networks under the uniform traffic pattern. Although,hypercube is a special case of k-ary n-cubes topology, the modeling approach for hypercube is more accurate than karyn-cubes due to its simpler structure. This paper proposes a general analytical model to predict message latency in wormhole-routed k-ary n-cubes with fully adaptive routing that uses a similar modeling approach to hypercube. The analysis focuses Duato's fully adaptive routing algorithm [12], which is widely accepted as the most general algorithm for achieving adaptivity in wormhole-routed networks while allowing for an efficient router implementation. The proposed model is general enough that it can be used for hypercube and other fully adaptive routing algorithms

    Numerical Relativity As A Tool For Computational Astrophysics

    Full text link
    The astrophysics of compact objects, which requires Einstein's theory of general relativity for understanding phenomena such as black holes and neutron stars, is attracting increasing attention. In general relativity, gravity is governed by an extremely complex set of coupled, nonlinear, hyperbolic-elliptic partial differential equations. The largest parallel supercomputers are finally approaching the speed and memory required to solve the complete set of Einstein's equations for the first time since they were written over 80 years ago, allowing one to attempt full 3D simulations of such exciting events as colliding black holes and neutron stars. In this paper we review the computational effort in this direction, and discuss a new 3D multi-purpose parallel code called ``Cactus'' for general relativistic astrophysics. Directions for further work are indicated where appropriate.Comment: Review for JCA
    • …
    corecore