284 research outputs found

    A Combined MPI-CUDA Parallel Solution of Linear and Nonlinear Poisson-Boltzmann Equation

    Get PDF

    Optimal convolution SOR acceleration of waveform relaxation with application to semiconductor device simulation

    Get PDF
    In this paper we describe a novel generalized SOR (successive overrelaxation) algorithm for accelerating the convergence of the dynamic iteration method known as waveform relaxation. A new convolution SOR algorithm is presented, along with a theorem for determining the optimal convolution SOR parameter. Both analytic and experimental results are given to demonstrate that the convergence of the convolution SOR algorithm is substantially faster than that of the more obvious frequency-independent waveform SOR algorithm. Finally, to demonstrate the general applicability of this new method, it is used to solve the differential-algebraic system generated by spatial discretization of the time-dependent semiconductor device equations

    Distributing the Kalman Filter for Large-Scale Systems

    Full text link
    This paper derives a \emph{distributed} Kalman filter to estimate a sparsely connected, large-scale, n−n-dimensional, dynamical system monitored by a network of NN sensors. Local Kalman filters are implemented on the (nl−n_l-dimensional, where nl≪nn_l\ll n) sub-systems that are obtained after spatially decomposing the large-scale system. The resulting sub-systems overlap, which along with an assimilation procedure on the local Kalman filters, preserve an LLth order Gauss-Markovian structure of the centralized error processes. The information loss due to the LLth order Gauss-Markovian approximation is controllable as it can be characterized by a divergence that decreases as L↑L\uparrow. The order of the approximation, LL, leads to a lower bound on the dimension of the sub-systems, hence, providing a criterion for sub-system selection. The assimilation procedure is carried out on the local error covariances with a distributed iterate collapse inversion (DICI) algorithm that we introduce. The DICI algorithm computes the (approximated) centralized Riccati and Lyapunov equations iteratively with only local communication and low-order computation. We fuse the observations that are common among the local Kalman filters using bipartite fusion graphs and consensus averaging algorithms. The proposed algorithm achieves full distribution of the Kalman filter that is coherent with the centralized Kalman filter with an LLth order Gaussian-Markovian structure on the centralized error processes. Nowhere storage, communication, or computation of n−n-dimensional vectors and matrices is needed; only nl≪nn_l \ll n dimensional vectors and matrices are communicated or used in the computation at the sensors

    Improving Performance of Iterative Methods by Lossy Checkponting

    Get PDF
    Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fundamental operations for many modern scientific simulations. When the large-scale iterative methods are running with a large number of ranks in parallel, they have to checkpoint the dynamic variables periodically in case of unavoidable fail-stop errors, requiring fast I/O systems and large storage space. To this end, significantly reducing the checkpointing overhead is critical to improving the overall performance of iterative methods. Our contribution is fourfold. (1) We propose a novel lossy checkpointing scheme that can significantly improve the checkpointing performance of iterative methods by leveraging lossy compressors. (2) We formulate a lossy checkpointing performance model and derive theoretically an upper bound for the extra number of iterations caused by the distortion of data in lossy checkpoints, in order to guarantee the performance improvement under the lossy checkpointing scheme. (3) We analyze the impact of lossy checkpointing (i.e., extra number of iterations caused by lossy checkpointing files) for multiple types of iterative methods. (4)We evaluate the lossy checkpointing scheme with optimal checkpointing intervals on a high-performance computing environment with 2,048 cores, using a well-known scientific computation package PETSc and a state-of-the-art checkpoint/restart toolkit. Experiments show that our optimized lossy checkpointing scheme can significantly reduce the fault tolerance overhead for iterative methods by 23%~70% compared with traditional checkpointing and 20%~58% compared with lossless-compressed checkpointing, in the presence of system failures.Comment: 14 pages, 10 figures, HPDC'1

    A self-gravity module for the PLUTO code

    Full text link
    We present a novel implementation of an iterative solver for the solution of the Poisson equation in the PLUTO code for astrophysical fluid dynamics. Our solver relies on a relaxation method in which convergence is sought as the steady-state solution of a parabolic equation, whose time-discretization is governed by the \textit{Runge-Kutta-Legendre} (RKL) method. Our findings indicate that the RKL-based Poisson solver, which is both fully parallel and rapidly convergent, has the potential to serve as a practical alternative to conventional iterative solvers such as the \textit{Gauss-Seidel} (GS) and \textit{successive over-relaxation} (SOR) methods. Additionally, it can mitigate some of the drawbacks of these traditional techniques. We incorporate our algorithm into a multigrid solver to provide a simple and efficient gravity solver that can be used to obtain the gravitational potentials in self-gravitational hydrodynamics. We test our implementation against a broad range of standard self-gravitating astrophysical problems designed to examine different aspects of the code. We demonstrate that the results match excellently with the analytical predictions (when available), and the findings of similar previous studies.Comment: Submitted to ApJS. Comments are welcom

    An Objective Analysis Technique for Constructing Three-Dimensional Urban-Scale Wind Fields

    Get PDF
    An objective analysis procedure for generating mass-consistent, urban-scale three-dimensional wind fields is presented together with a comparison against existing techniques. The algorithm employs terrain following coordinates and variable vertical grid spacing. Initial estimates of the velocity field are developed by interpolating surface and upper level wind measurements. A local terrain adjustment technique, involving solution of the Poisson equation, is used to establish the horizontal components of the surface field. Vertical velocities are developed from successive solutions of the continuity equation followed by an iterative procedure which reduces anomalous divergence in the complete field. Major advantages of the procedure are that it is computationally efficient and allows boundary values to adjust in response to changes in the interior flow. The method has been successfully tested using field measurements and problems with known analytic solutions
    • …
    corecore