173,204 research outputs found
Improving Performance of Iterative Methods by Lossy Checkponting
Iterative methods are commonly used approaches to solve large, sparse linear
systems, which are fundamental operations for many modern scientific
simulations. When the large-scale iterative methods are running with a large
number of ranks in parallel, they have to checkpoint the dynamic variables
periodically in case of unavoidable fail-stop errors, requiring fast I/O
systems and large storage space. To this end, significantly reducing the
checkpointing overhead is critical to improving the overall performance of
iterative methods. Our contribution is fourfold. (1) We propose a novel lossy
checkpointing scheme that can significantly improve the checkpointing
performance of iterative methods by leveraging lossy compressors. (2) We
formulate a lossy checkpointing performance model and derive theoretically an
upper bound for the extra number of iterations caused by the distortion of data
in lossy checkpoints, in order to guarantee the performance improvement under
the lossy checkpointing scheme. (3) We analyze the impact of lossy
checkpointing (i.e., extra number of iterations caused by lossy checkpointing
files) for multiple types of iterative methods. (4)We evaluate the lossy
checkpointing scheme with optimal checkpointing intervals on a high-performance
computing environment with 2,048 cores, using a well-known scientific
computation package PETSc and a state-of-the-art checkpoint/restart toolkit.
Experiments show that our optimized lossy checkpointing scheme can
significantly reduce the fault tolerance overhead for iterative methods by
23%~70% compared with traditional checkpointing and 20%~58% compared with
lossless-compressed checkpointing, in the presence of system failures.Comment: 14 pages, 10 figures, HPDC'1
Radio interferometric imaging of spatial structure that varies with time and frequency
The spatial-frequency coverage of a radio interferometer is increased by
combining samples acquired at different times and observing frequencies.
However, astrophysical sources often contain complicated spatial structure that
varies within the time-range of an observation, or the bandwidth of the
receiver being used, or both. Image reconstruction algorithms can been designed
to model time and frequency variability in addition to the average intensity
distribution, and provide an improvement over traditional methods that ignore
all variability. This paper describes an algorithm designed for such
structures, and evaluates it in the context of reconstructing three-dimensional
time-varying structures in the solar corona from radio interferometric
measurements between 5 GHz and 15 GHz using existing telescopes such as the
EVLA and at angular resolutions better than that allowed by traditional
multi-frequency analysis algorithms.Comment: 12 pages, 4 figures. SPIE Proceedings, Optical
Engineering+Applications; Image Reconstruction from Incomplete Dat
A Moving Frame Algorithm for High Mach Number Hydrodynamics
We present a new approach to Eulerian computational fluid dynamics that is
designed to work at high Mach numbers encountered in astrophysical hydrodynamic
simulations. The Eulerian fluid conservation equations are solved in an
adaptive frame moving with the fluid where Mach numbers are minimized. The
moving frame approach uses a velocity decomposition technique to define local
kinetic variables while storing the bulk kinetic components in a smoothed
background velocity field that is associated with the grid velocity.
Gravitationally induced accelerations are added to the grid, thereby minimizing
the spurious heating problem encountered in cold gas flows. Separately tracking
local and bulk flow components allows thermodynamic variables to be accurately
calculated in both subsonic and supersonic regions. A main feature of the
algorithm, that is not possible in previous Eulerian implementations, is the
ability to resolve shocks and prevent spurious heating where both the preshock
and postshock Mach numbers are high. The hybrid algorithm combines the high
resolution shock capturing ability of the second-order accurate Eulerian TVD
scheme with a low-diffusion Lagrangian advection scheme. We have implemented a
cosmological code where the hydrodynamic evolution of the baryons is captured
using the moving frame algorithm while the gravitational evolution of the
collisionless dark matter is tracked using a particle-mesh N-body algorithm.
The MACH code is highly suited for simulating the evolution of the IGM where
accurate thermodynamic evolution is needed for studies of the Lyman alpha
forest, the Sunyaev-Zeldovich effect, and the X-ray background. Hydrodynamic
and cosmological tests are described and results presented. The current code is
fast, memory-friendly, and parallelized for shared-memory machines.Comment: 19 pages, 5 figure
Implementation of the LANS-alpha turbulence model in a primitive equation ocean model
This paper presents the first numerical implementation and tests of the
Lagrangian-averaged Navier-Stokes-alpha (LANS-alpha) turbulence model in a
primitive equation ocean model. The ocean model in which we work is the Los
Alamos Parallel Ocean Program (POP); we refer to POP and our implementation of
LANS-alpha as POP-alpha. Two versions of POP-alpha are presented: the full
POP-alpha algorithm is derived from the LANS-alpha primitive equations, but
requires a nested iteration that makes it too slow for practical simulations; a
reduced POP-alpha algorithm is proposed, which lacks the nested iteration and
is two to three times faster than the full algorithm. The reduced algorithm
does not follow from a formal derivation of the LANS-alpha model equations.
Despite this, simulations of the reduced algorithm are nearly identical to the
full algorithm, as judged by globally averaged temperature and kinetic energy,
and snapshots of temperature and velocity fields. Both POP-alpha algorithms can
run stably with longer timesteps than standard POP.
Comparison of implementations of full and reduced POP-alpha algorithms are
made within an idealized test problem that captures some aspects of the
Antarctic Circumpolar Current, a problem in which baroclinic instability is
prominent. Both POP-alpha algorithms produce statistics that resemble
higher-resolution simulations of standard POP.
A linear stability analysis shows that both the full and reduced POP-alpha
algorithms benefit from the way the LANS-alpha equations take into account the
effects of the small scales on the large. Both algorithms (1) are stable; (2)
make the Rossby Radius effectively larger; and (3) slow down Rossby and gravity
waves.Comment: Submitted to J. Computational Physics March 21, 200
Recommended from our members
The effect of FPU architecture on a dynamic precision algorithm for the solution of differential equations
Solution of lnitial Value Problems (IVPs) is an important application in scientific computing. Methods for solving these problems use techniques for reducing the error and increasing the speed of the computation. This paper introduces a class of algorithms which dynamically reconfigure their operating parameters to reduce the computation time. By dynamically varying the precision of the arithmetic being performed, it is possible to obtain dramatic speedups on certain architectures when solving IVPs. This paper illustrates how various architectures impact on a dynamic precision version of the Runge-Kutta-Fehlberg algorithm. It is shown that a speedup of over 30 percent is possible for both massively parallel processors and vector supercomputers
Domain Decomposition Based High Performance Parallel Computing\ud
The study deals with the parallelization of finite element based Navier-Stokes codes using domain decomposition and state-ofart sparse direct solvers. There has been significant improvement in the performance of sparse direct solvers. Parallel sparse direct solvers are not found to exhibit good scalability. Hence, the parallelization of sparse direct solvers is done using domain decomposition techniques. A highly efficient sparse direct solver PARDISO is used in this study. The scalability of both Newton and modified Newton algorithms are tested
- …