Search CORE

173,204 research outputs found

Improving Performance of Iterative Methods by Lossy Checkponting

Author: Acosta J. Mora
Agullo E.
Balay S.
Barrett R.
Barrett R.
Bode B.
Calhoun J.
Heath M. T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/05/2018
Field of study

Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fundamental operations for many modern scientific simulations. When the large-scale iterative methods are running with a large number of ranks in parallel, they have to checkpoint the dynamic variables periodically in case of unavoidable fail-stop errors, requiring fast I/O systems and large storage space. To this end, significantly reducing the checkpointing overhead is critical to improving the overall performance of iterative methods. Our contribution is fourfold. (1) We propose a novel lossy checkpointing scheme that can significantly improve the checkpointing performance of iterative methods by leveraging lossy compressors. (2) We formulate a lossy checkpointing performance model and derive theoretically an upper bound for the extra number of iterations caused by the distortion of data in lossy checkpoints, in order to guarantee the performance improvement under the lossy checkpointing scheme. (3) We analyze the impact of lossy checkpointing (i.e., extra number of iterations caused by lossy checkpointing files) for multiple types of iterative methods. (4)We evaluate the lossy checkpointing scheme with optimal checkpointing intervals on a high-performance computing environment with 2,048 cores, using a well-known scientific computation package PETSc and a state-of-the-art checkpoint/restart toolkit. Experiments show that our optimized lossy checkpointing scheme can significantly reduce the fault tolerance overhead for iterative methods by 23%~70% compared with traditional checkpointing and 20%~58% compared with lossless-compressed checkpointing, in the presence of system failures.Comment: 14 pages, 10 figures, HPDC'1

arXiv.org e-Print Archive

Crossref

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Radio interferometric imaging of spatial structure that varies with time and frequency

Author: Rau U.
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 20/08/2012
Field of study

The spatial-frequency coverage of a radio interferometer is increased by combining samples acquired at different times and observing frequencies. However, astrophysical sources often contain complicated spatial structure that varies within the time-range of an observation, or the bandwidth of the receiver being used, or both. Image reconstruction algorithms can been designed to model time and frequency variability in addition to the average intensity distribution, and provide an improvement over traditional methods that ignore all variability. This paper describes an algorithm designed for such structures, and evaluates it in the context of reconstructing three-dimensional time-varying structures in the solar corona from radio interferometric measurements between 5 GHz and 15 GHz using existing telescopes such as the EVLA and at angular resolutions better than that allowed by traditional multi-frequency analysis algorithms.Comment: 12 pages, 4 figures. SPIE Proceedings, Optical Engineering+Applications; Image Reconstruction from Incomplete Dat

arXiv.org e-Print Archive

Crossref

A Moving Frame Algorithm for High Mach Number Hydrodynamics

Author: Pen Ue-Li
Trac Hy
Publication venue: 'Elsevier BV'
Publication date: 01/01/2003
Field of study

We present a new approach to Eulerian computational fluid dynamics that is designed to work at high Mach numbers encountered in astrophysical hydrodynamic simulations. The Eulerian fluid conservation equations are solved in an adaptive frame moving with the fluid where Mach numbers are minimized. The moving frame approach uses a velocity decomposition technique to define local kinetic variables while storing the bulk kinetic components in a smoothed background velocity field that is associated with the grid velocity. Gravitationally induced accelerations are added to the grid, thereby minimizing the spurious heating problem encountered in cold gas flows. Separately tracking local and bulk flow components allows thermodynamic variables to be accurately calculated in both subsonic and supersonic regions. A main feature of the algorithm, that is not possible in previous Eulerian implementations, is the ability to resolve shocks and prevent spurious heating where both the preshock and postshock Mach numbers are high. The hybrid algorithm combines the high resolution shock capturing ability of the second-order accurate Eulerian TVD scheme with a low-diffusion Lagrangian advection scheme. We have implemented a cosmological code where the hydrodynamic evolution of the baryons is captured using the moving frame algorithm while the gravitational evolution of the collisionless dark matter is tracked using a particle-mesh N-body algorithm. The MACH code is highly suited for simulating the evolution of the IGM where accurate thermodynamic evolution is needed for studies of the Lyman alpha forest, the Sunyaev-Zeldovich effect, and the X-ray background. Hydrodynamic and cosmological tests are described and results presented. The current code is fast, memory-friendly, and parallelized for shared-memory machines.Comment: 19 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Implementation of the LANS-alpha turbulence model in a primitive equation ocean model

Author: Andrews
Aref
Beth A. Wingate
Boville
Bryan
Bryan
Böning
Chen
Chen
Chen
Cox
Darryl D. Holm
Dukowicz
Eckart
Foias
Gent
Germano
Geurts
Geurts
Geurts
Griffies
Griffies
Henning
Holm
Holm
Holm
Holm
Holton
Karsten
Mark R. Petersen
Marsden
Matthew W. Hecht
Meneveau
Petersen
Redi
Smith
Solomon
Veronis
Visbeck
Wingate
Zhao
Publication venue: 'Elsevier BV'
Publication date: 21/03/2007
Field of study

This paper presents the first numerical implementation and tests of the Lagrangian-averaged Navier-Stokes-alpha (LANS-alpha) turbulence model in a primitive equation ocean model. The ocean model in which we work is the Los Alamos Parallel Ocean Program (POP); we refer to POP and our implementation of LANS-alpha as POP-alpha. Two versions of POP-alpha are presented: the full POP-alpha algorithm is derived from the LANS-alpha primitive equations, but requires a nested iteration that makes it too slow for practical simulations; a reduced POP-alpha algorithm is proposed, which lacks the nested iteration and is two to three times faster than the full algorithm. The reduced algorithm does not follow from a formal derivation of the LANS-alpha model equations. Despite this, simulations of the reduced algorithm are nearly identical to the full algorithm, as judged by globally averaged temperature and kinetic energy, and snapshots of temperature and velocity fields. Both POP-alpha algorithms can run stably with longer timesteps than standard POP. Comparison of implementations of full and reduced POP-alpha algorithms are made within an idealized test problem that captures some aspects of the Antarctic Circumpolar Current, a problem in which baroclinic instability is prominent. Both POP-alpha algorithms produce statistics that resemble higher-resolution simulations of standard POP. A linear stability analysis shows that both the full and reduced POP-alpha algorithms benefit from the way the LANS-alpha equations take into account the effects of the small scales on the large. Both algorithms (1) are stable; (2) make the Rossby Radius effectively larger; and (3) slow down Rossby and gravity waves.Comment: Submitted to J. Computational Physics March 21, 200

arXiv.org e-Print Archive

Crossref

CERN Document Server

Recommended from our members

The effect of FPU architecture on a dynamic precision algorithm for the solution of differential equations

Author: Kramer David
Scherson Isaac D.
Publication venue: eScholarship, University of California
Publication date: 05/11/1991
Field of study

Solution of lnitial Value Problems (IVPs) is an important application in scientific computing. Methods for solving these problems use techniques for reducing the error and increasing the speed of the computation. This paper introduces a class of algorithms which dynamically reconfigure their operating parameters to reduce the computation time. By dynamically varying the precision of the arithmetic being performed, it is possible to obtain dramatic speedups on certain architectures when solving IVPs. This paper illustrates how various architectures impact on a dynamic precision version of the Runge-Kutta-Fehlberg algorithm. It is shown that a speedup of over 30 percent is possible for both massively parallel processors and vector supercomputers

eScholarship - University of California

Domain Decomposition Based High Performance Parallel Computing\ud

Author: Khaitan Siddhartha
Raju Mandhapati P.
Publication venue: International Journal of Computer Science Issues, IJCSI
Publication date: 01/10/2009
Field of study

The study deals with the parallelization of finite element based Navier-Stokes codes using domain decomposition and state-ofart sparse direct solvers. There has been significant improvement in the performance of sparse direct solvers. Parallel sparse direct solvers are not found to exhibit good scalability. Hence, the parallelization of sparse direct solvers is done using domain decomposition techniques. A highly efficient sparse direct solver PARDISO is used in this study. The scalability of both Newton and modified Newton algorithms are tested

arXiv.org e-Print Archive

CogPrints Cognitive Sciences Eprint Archive