173,204 research outputs found

    Improving Performance of Iterative Methods by Lossy Checkponting

    Get PDF
    Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fundamental operations for many modern scientific simulations. When the large-scale iterative methods are running with a large number of ranks in parallel, they have to checkpoint the dynamic variables periodically in case of unavoidable fail-stop errors, requiring fast I/O systems and large storage space. To this end, significantly reducing the checkpointing overhead is critical to improving the overall performance of iterative methods. Our contribution is fourfold. (1) We propose a novel lossy checkpointing scheme that can significantly improve the checkpointing performance of iterative methods by leveraging lossy compressors. (2) We formulate a lossy checkpointing performance model and derive theoretically an upper bound for the extra number of iterations caused by the distortion of data in lossy checkpoints, in order to guarantee the performance improvement under the lossy checkpointing scheme. (3) We analyze the impact of lossy checkpointing (i.e., extra number of iterations caused by lossy checkpointing files) for multiple types of iterative methods. (4)We evaluate the lossy checkpointing scheme with optimal checkpointing intervals on a high-performance computing environment with 2,048 cores, using a well-known scientific computation package PETSc and a state-of-the-art checkpoint/restart toolkit. Experiments show that our optimized lossy checkpointing scheme can significantly reduce the fault tolerance overhead for iterative methods by 23%~70% compared with traditional checkpointing and 20%~58% compared with lossless-compressed checkpointing, in the presence of system failures.Comment: 14 pages, 10 figures, HPDC'1

    Radio interferometric imaging of spatial structure that varies with time and frequency

    Full text link
    The spatial-frequency coverage of a radio interferometer is increased by combining samples acquired at different times and observing frequencies. However, astrophysical sources often contain complicated spatial structure that varies within the time-range of an observation, or the bandwidth of the receiver being used, or both. Image reconstruction algorithms can been designed to model time and frequency variability in addition to the average intensity distribution, and provide an improvement over traditional methods that ignore all variability. This paper describes an algorithm designed for such structures, and evaluates it in the context of reconstructing three-dimensional time-varying structures in the solar corona from radio interferometric measurements between 5 GHz and 15 GHz using existing telescopes such as the EVLA and at angular resolutions better than that allowed by traditional multi-frequency analysis algorithms.Comment: 12 pages, 4 figures. SPIE Proceedings, Optical Engineering+Applications; Image Reconstruction from Incomplete Dat

    A Moving Frame Algorithm for High Mach Number Hydrodynamics

    Full text link
    We present a new approach to Eulerian computational fluid dynamics that is designed to work at high Mach numbers encountered in astrophysical hydrodynamic simulations. The Eulerian fluid conservation equations are solved in an adaptive frame moving with the fluid where Mach numbers are minimized. The moving frame approach uses a velocity decomposition technique to define local kinetic variables while storing the bulk kinetic components in a smoothed background velocity field that is associated with the grid velocity. Gravitationally induced accelerations are added to the grid, thereby minimizing the spurious heating problem encountered in cold gas flows. Separately tracking local and bulk flow components allows thermodynamic variables to be accurately calculated in both subsonic and supersonic regions. A main feature of the algorithm, that is not possible in previous Eulerian implementations, is the ability to resolve shocks and prevent spurious heating where both the preshock and postshock Mach numbers are high. The hybrid algorithm combines the high resolution shock capturing ability of the second-order accurate Eulerian TVD scheme with a low-diffusion Lagrangian advection scheme. We have implemented a cosmological code where the hydrodynamic evolution of the baryons is captured using the moving frame algorithm while the gravitational evolution of the collisionless dark matter is tracked using a particle-mesh N-body algorithm. The MACH code is highly suited for simulating the evolution of the IGM where accurate thermodynamic evolution is needed for studies of the Lyman alpha forest, the Sunyaev-Zeldovich effect, and the X-ray background. Hydrodynamic and cosmological tests are described and results presented. The current code is fast, memory-friendly, and parallelized for shared-memory machines.Comment: 19 pages, 5 figure

    Implementation of the LANS-alpha turbulence model in a primitive equation ocean model

    Get PDF
    This paper presents the first numerical implementation and tests of the Lagrangian-averaged Navier-Stokes-alpha (LANS-alpha) turbulence model in a primitive equation ocean model. The ocean model in which we work is the Los Alamos Parallel Ocean Program (POP); we refer to POP and our implementation of LANS-alpha as POP-alpha. Two versions of POP-alpha are presented: the full POP-alpha algorithm is derived from the LANS-alpha primitive equations, but requires a nested iteration that makes it too slow for practical simulations; a reduced POP-alpha algorithm is proposed, which lacks the nested iteration and is two to three times faster than the full algorithm. The reduced algorithm does not follow from a formal derivation of the LANS-alpha model equations. Despite this, simulations of the reduced algorithm are nearly identical to the full algorithm, as judged by globally averaged temperature and kinetic energy, and snapshots of temperature and velocity fields. Both POP-alpha algorithms can run stably with longer timesteps than standard POP. Comparison of implementations of full and reduced POP-alpha algorithms are made within an idealized test problem that captures some aspects of the Antarctic Circumpolar Current, a problem in which baroclinic instability is prominent. Both POP-alpha algorithms produce statistics that resemble higher-resolution simulations of standard POP. A linear stability analysis shows that both the full and reduced POP-alpha algorithms benefit from the way the LANS-alpha equations take into account the effects of the small scales on the large. Both algorithms (1) are stable; (2) make the Rossby Radius effectively larger; and (3) slow down Rossby and gravity waves.Comment: Submitted to J. Computational Physics March 21, 200

    Domain Decomposition Based High Performance Parallel Computing\ud

    Get PDF
    The study deals with the parallelization of finite element based Navier-Stokes codes using domain decomposition and state-ofart sparse direct solvers. There has been significant improvement in the performance of sparse direct solvers. Parallel sparse direct solvers are not found to exhibit good scalability. Hence, the parallelization of sparse direct solvers is done using domain decomposition techniques. A highly efficient sparse direct solver PARDISO is used in this study. The scalability of both Newton and modified Newton algorithms are tested
    • …
    corecore