65 research outputs found

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    Solution of partial differential equations on vector and parallel computers

    Get PDF
    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

    Parallelization of implicit finite difference schemes in computational fluid dynamics

    Get PDF
    Implicit finite difference schemes are often the preferred numerical schemes in computational fluid dynamics, requiring less stringent stability bounds than the explicit schemes. Each iteration in an implicit scheme involves global data dependencies in the form of second and higher order recurrences. Efficient parallel implementations of such iterative methods are considerably more difficult and non-intuitive. The parallelization of the implicit schemes that are used for solving the Euler and the thin layer Navier-Stokes equations and that require inversions of large linear systems in the form of block tri-diagonal and/or block penta-diagonal matrices is discussed. Three-dimensional cases are emphasized and schemes that minimize the total execution time are presented. Partitioning and scheduling schemes for alleviating the effects of the global data dependencies are described. An analysis of the communication and the computation aspects of these methods is presented. The effect of the boundary conditions on the parallel schemes is also discussed

    Experimental Study of a Parallel Iterative Solver for Markov Chain Modeling

    Full text link
    This paper presents the results of a preliminary experimental investigation of the performance of a stationary iterative method based on a block staircase splitting for solving singular systems of linear equations arising in Markov chain modelling. From the experiments presented, we can deduce that the method is well suited for solving block banded or more generally localized systems in a parallel computing environment. The parallel implementation has been benchmarked using several Markovian models

    Efficient GPU implementation of a Boltzmann‑Schrödinger‑Poisson solver for the simulation of nanoscale DG MOSFETs

    Get PDF
    81–102, 2019) describes an efficient and accurate solver for nanoscale DG MOSFETs through a deterministic Boltzmann-Schrödinger-Poisson model with seven electron–phonon scattering mechanisms on a hybrid parallel CPU/GPU platform. The transport computational phase, i.e. the time integration of the Boltzmann equations, was ported to the GPU using CUDA extensions, but the computation of the system’s eigenstates, i.e. the solution of the Schrödinger-Poisson block, was parallelized only using OpenMP due to its complexity. This work fills the gap by describing a port to GPU for the solver of the Schrödinger-Poisson block. This new proposal implements on GPU a Scheduled Relaxation Jacobi method to solve the sparse linear systems which arise in the 2D Poisson equation. The 1D Schrödinger equation is solved on GPU by adapting a multi-section iteration and the Newton-Raphson algorithm to approximate the energy levels, and the Inverse Power Iterative Method is used to approximate the wave vectors. We want to stress that this solver for the Schrödinger-Poisson block can be thought as a module independent of the transport phase (Boltzmann) and can be used for solvers using different levels of description for the electrons; therefore, it is of particular interest because it can be adapted to other macroscopic, hence faster, solvers for confined devices exploited at industrial level.Project PID2020-117846GB-I00 funded by the Spanish Ministerio de Ciencia e InnovaciónProject A-TIC-344-UGR20 funded by European Regional Development Fund

    NUMERICAL INVESTIGATION AND PARALLEL COMPUTING FOR THERMAL TRANSPORT MECHANISM DURING NANOMACHINING

    Get PDF
    Nano-scale machining, or Nanomachining is a hybrid process in which the total thermal energy necessary to remove atoms from a work-piece surface is applied from external sources. In the current study, the total thermal energy necessary to remove atoms from a work-piece surface is applied from two sources: (1) localized energy from a laser beam focused to a micron-scale spot to preheat the work-piece, and (2) a high-precision electron-beam emitted from the tips of carbon nano-tubes to remove material via evaporation/sublimation. Macro-to-nano scale heat transfer models are discussed for understanding their capability to capture and its application to predict the transient heat transfer mechanism required for nano-machining. In this case, thermal transport mechanism during nano-scale machining involves both phonons (lattice vibrations) and electrons; it is modeled using a parabolic two-step (PTS) model, which accounts for the time lag between these energy carriers. A numerical algorithm is developed for the solution of the PTS model based on explicit and implicit finite-difference methods. Since numerical solution for simulation of nanomachining involves high computational cost in terms of wall clock time consumed, performance comparison over a wide range of numerical techniques has been done to devise an efficient numerical solution procedure. Gauss-Seidel (GS), successive over relaxation (SOR), conjugate gradient (CG), d -form Douglas-Gunn time splitting, and other methods have been used to compare the computational cost involved in these methods. Use of the Douglas-Gunn time splitting in the solution of 3D time-dependent heat transport equations appears to be optimal especially as problem size (number of spatial grid points and/or required number of time steps) becomes large. Parallel computing is implemented to further reduce the wall clock time required for the complete simulation of nanomachining process. Domain decomposition with inter-processor communication using Message Passing Interface (MPI) libraries is adapted for parallel computing. Performance tuning has been implemented for efficient parallelization by overlapping communication with computation. Numerical solution for laser source and electron-beam source with different Gaussian distribution are presented. Performance of the parallel code is tested on four distinct computer cluster architecture. Results obtained for laser source agree well with available experimental data in the literature. The results for electron-beam source are self-consistent; nevertheless, they need to be validated experimentally

    Parallelization of Reordering Algorithms for Bandwidth and Wavefront Reduction

    Full text link
    Abstract—Many sparse matrix computations can be speeded up if the matrix is first reordered. Reordering was originally developed for direct methods but it has recently become popular for improving the cache locality of parallel iterative solvers since reordering the matrix to reduce bandwidth and wavefront can improve the locality of reference of sparse matrix-vector multiplication (SpMV), the key kernel in iterative solvers. In this paper, we present the first parallel implementations of two widely used reordering algorithms: Reverse Cuthill-McKee (RCM) and Sloan. On 16 cores of the Stampede supercomputer, our parallel RCM is 5.56 times faster on the average than a state-of-the-art sequential implementation of RCM in the HSL library. Sloan is significantly more constrained than RCM, but our parallel implementation achieves a speedup of 2.88X on the average over sequential HSL-Sloan. Reordering the matrix using our parallel RCM and then performing 100 SpMV iterations is twice as fast as using HSL-RCM and then performing the SpMV iterations; it is also 1.5 times faster than performing the SpMV iterations without reordering the matrix. I

    Applications in GNSS water vapor tomography

    Get PDF
    Algebraic reconstruction algorithms are iterative algorithms that are used in many area including medicine, seismology or meteorology. These algorithms are known to be highly computational intensive. This may be especially troublesome for real-time applications or when processed by conventional low-cost personnel computers. One of these real time applications is the reconstruction of water vapor images from Global Navigation Satellite System (GNSS) observations. The parallelization of algebraic reconstruction algorithms has the potential to diminish signi cantly the required resources permitting to obtain valid solutions in time to be used for nowcasting and forecasting weather models. The main objective of this dissertation was to present and analyse diverse shared memory libraries and techniques in CPU and GPU for algebraic reconstruction algorithms. It was concluded that the parallelization compensates over sequential implementations. Overall the GPU implementations were found to be only slightly faster than the CPU implementations, depending on the size of the problem being studied. A secondary objective was to develop a software to perform the GNSS water vapor reconstruction using the implemented parallel algorithms. This software has been developed with success and diverse tests were made namely with synthetic and real data, the preliminary results shown to be satisfactory. This dissertation was written in the Space & Earth Geodetic Analysis Laboratory (SEGAL) and was carried out in the framework of the Structure of Moist convection in high-resolution GNSS observations and models (SMOG) (PTDC/CTE-ATM/119922/2010) project funded by FCT.Algoritmos de reconstrução algébrica são algoritmos iterativos que são usados em muitas áreas incluindo medicina, sismologia ou meteorologia. Estes algoritmos são conhecidos por serem bastante exigentes computacionalmente. Isto pode ser especialmente complicado para aplicações de tempo real ou quando processados por computadores pessoais de baixo custo. Uma destas aplicações de tempo real é a reconstrução de imagens de vapor de água a partir de observações de sistemas globais de navegação por satélite. A paralelização dos algoritmos de reconstrução algébrica permite que se reduza significativamente os requisitos computacionais permitindo obter soluções válidas para previsão meteorológica num curto espaço de tempo. O principal objectivo desta dissertação é apresentar e analisar diversas bibliotecas e técnicas multithreading para a reconstrução algébrica em CPU e GPU. Foi concluído que a paralelização compensa sobre a implementações sequenciais. De um modo geral as implementações GPU obtiveram resultados relativamente melhores que implementações em CPU, isto dependendo do tamanho do problema a ser estudado. Um objectivo secundário era desenvolver uma aplicação que realizasse a reconstrução de imagem de vapor de água através de sistemas globais de navegação por satélite de uma forma paralela. Este software tem sido desenvolvido com sucesso e diversos testes foram realizados com dados sintéticos e dados reais, os resultados preliminares foram satisfatórios. Esta dissertação foi escrita no Space & Earth Geodetic Analysis Laboratory (SEGAL) e foi realizada de acordo com o projecto Structure 01' Moist convection in high-resolution GNSS observations and models (SMOG) (PTDC / CTE-ATM/ 11992212010) financiado pelo FCT.Fundação para a Ciência e a Tecnologia (FCT

    Semiannual report, 1 October 1990 - 31 March 1991

    Get PDF
    Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science is summarized
    corecore