71 research outputs found

    Asynchronous and corrected-asynchronous numerical solutions of parabolic PDES on MIMD multiprocessors

    Get PDF
    A major problem in achieving significant speed-up on parallel machines is the overhead involved with synchronizing the concurrent process. Removing the synchronization constraint has the potential of speeding up the computation. The authors present asynchronous (AS) and corrected-asynchronous (CA) finite difference schemes for the multi-dimensional heat equation. Although the discussion concentrates on the Euler scheme for the solution of the heat equation, it has the potential for being extended to other schemes and other parabolic partial differential equations (PDEs). These schemes are analyzed and implemented on the shared memory multi-user Sequent Balance machine. Numerical results for one and two dimensional problems are presented. It is shown experimentally that the synchronization penalty can be about 50 percent of run time: in most cases, the asynchronous scheme runs twice as fast as the parallel synchronous scheme. In general, the efficiency of the parallel schemes increases with processor load, with the time level, and with the problem dimension. The efficiency of the AS may reach 90 percent and over, but it provides accurate results only for steady-state values. The CA, on the other hand, is less efficient, but provides more accurate results for intermediate (non steady-state) values

    Comparative Study of the Execution Time of Parallel Heat Equation on CPU and GPU

    Get PDF
    Parallelization has become a universal technique for computing an intensive scientific simulation to shorten the execution time of complex problems. It consists of bringing together the power of several thousand processors to perform complex calculations at high speed. The choice of the runtime environment to execute parallel programs significantly influences the execution time. For this reason, this article aims to materialize the impact of computing architectures on the performance of parallel implementations. To better achieve this contribution, we have implemented the heat equation executed on CUDA platform and we have compared the results with those of SkelGIS implementation from the literature. Through the results of the experiments, we demonstrated that the execution time of the CUDA implementation on graphics processing unit (GPU) is almost 100X faster for very large meshes compared to the other implementations

    Efficient Numerical Solution of Large Scale Algebraic Matrix Equations in PDE Control and Model Order Reduction

    Get PDF
    Matrix Lyapunov and Riccati equations are an important tool in mathematical systems theory. They are the key ingredients in balancing based model order reduction techniques and linear quadratic regulator problems. For small and moderately sized problems these equations are solved by techniques with at least cubic complexity which prohibits their usage in large scale applications. Around the year 2000 solvers for large scale problems have been introduced. The basic idea there is to compute a low rank decomposition of the quadratic and dense solution matrix and in turn reduce the memory and computational complexity of the algorithms. In this thesis efficiency enhancing techniques for the low rank alternating directions implicit iteration based solution of large scale matrix equations are introduced and discussed. Also the applicability in the context of real world systems is demonstrated. The thesis is structured in seven central chapters. After the introduction chapter 2 introduces the basic concepts and notations needed as fundamental tools for the remainder of the thesis. The next chapter then introduces a collection of test examples spanning from easily scalable academic test systems to badly conditioned technical applications which are used to demonstrate the features of the solvers. Chapter four and five describe the basic solvers and the modifications taken to make them applicable to an even larger class of problems. The following two chapters treat the application of the solvers in the context of model order reduction and linear quadratic optimal control of PDEs. The final chapter then presents the extensive numerical testing undertaken with the solvers proposed in the prior chapters. Some conclusions and an appendix complete the thesis

    Algebraic, Block and Multiplicative Preconditioners based on Fast Tridiagonal Solves on GPUs

    Get PDF
    This thesis contributes to the field of sparse linear algebra, graph applications, and preconditioners for Krylov iterative solvers of sparse linear equation systems, by providing a (block) tridiagonal solver library, a generalized sparse matrix-vector implementation, a linear forest extraction, and a multiplicative preconditioner based on tridiagonal solves. The tridiagonal library, which supports (scaled) partial pivoting, outperforms cuSPARSE's tridiagonal solver by factor five while completely utilizing the available GPU memory bandwidth. For the performance optimized solving of multiple right-hand sides, the explicit factorization of the tridiagonal matrix can be computed. The extraction of a weighted linear forest (union of disjoint paths) from a general graph is used to build algebraic (block) tridiagonal preconditioners and deploys the generalized sparse-matrix vector implementation of this thesis for preconditioner construction. During linear forest extraction, a new parallel bidirectional scan pattern, which can operate on double-linked list structures, identifies the path ID and the position of a vertex. The algebraic preconditioner construction is also used to build more advanced preconditioners, which contain multiple tridiagonal factors, based on generalized ILU factorizations. Additionally, other preconditioners based on tridiagonal factors are presented and evaluated in comparison to ILU and ILU incomplete sparse approximate inverse preconditioners (ILU-ISAI) for the solution of large sparse linear equation systems from the Sparse Matrix Collection. For all presented problems of this thesis, an efficient parallel algorithm and its CUDA implementation for single GPU systems is provided

    Lectures on Computational Numerical Analysis of Partial Differential Equations

    Get PDF
    From Chapter 1: The purpose of these lectures is to present a set of straightforward numerical methods with applicability to essentially any problem associated with a partial differential equation (PDE) or system of PDEs independent of type, spatial dimension or form of nonlinearity.https://uknowledge.uky.edu/me_textbooks/1002/thumbnail.jp

    The Investigation of Efficiency of Physical Phenomena Modelling Using Differential Equations on Distributed Systems

    Get PDF
    This work is dedicated to development of mathematical modelling software. In this dissertation numerical methods and algorithms are investigated in software making context. While applying a numerical method it is important to take into account the limited computer resources, the architecture of these resources and how do methods affect software robustness. Three main aspects of this investigation are that software implementation must be efficient, robust and be able to utilize specific hardware resources. The hardware specificity in this work is related to distributed computations of different types: single CPU with multiple cores, multiple CPUs with multiple cores and highly parallel multithreaded GPU device. The investigation is done in three directions: GPU usage for 3D FDTD calculations, FVM method usage to implement efficient calculations of a very specific heat transferring problem, and development of special techniques for software for specific bacteria self organization problem when the results are sensitive to numerical methods, initial data and even computer round-off errors. All these directions are dedicated to create correct technological components that make a software implementation robust and efficient. The time prediction model for 3D FDTD calculations is proposed, which lets to evaluate the efficiency of different GPUs. A reasonable speedup with GPU comparing to CPU is obtained. For FVM implementation the OpenFOAM open source software is selected as a basis for implementation of calculations and a few algorithms and their modifications to solve efficiency issues are proposed. The FVM parallel solver is implemented and analyzed, it is adapted to heterogeneous cluster Vilkas. To create robust software for simulation of bacteria self organization mathematically robust methods are applied and results are analyzed, the algorithm is modified for parallel computations

    A Variable-Structure Variable-Order Simulation Paradigm for Power Electronic Circuits

    Get PDF
    Solid-state power converters are used in a rapidly growing number of applications including variable-speed motor drives for hybrid electric vehicles and industrial applications, battery energy storage systems, and for interfacing renewable energy sources and controlling power flow in electric power systems. The desire for higher power densities and improved efficiencies necessitates the accurate prediction of switching transients and losses that, historically, have been categorized as conduction and switching losses. In the vast majority of analyses, the power semiconductors (diodes, transistors) are represented using simplified or empirical models. Conduction losses are calculated as the product of circuit-dependent currents and on-state voltage drops. Switching losses are estimated using approximate voltage-current waveforms with empirically derived turn-on and turn-off times
    corecore