6 research outputs found

    Sparse Systems Solving on GPUs with GMRES

    No full text
    International audienceScientific applications very often rely on solving one or more linear systems. When matrices are sparse, iterative methods are preferred to direct ones. Nevertheless, the value of non zero elements and their distribution (i.e. the sketch of the matrix) greatly influence the efficiency of those methods (in terms of computation time, number of iterations, result precision) or simply prevent the convergence. Among iterative methods, GMRES is often chosen when dealing with general non symmetric matrices. Indeed its convergence is very fast and more stable than the biconjugate gradient. Furthermore, it is mainly based on mathematical operations (matrix-vector and dot products, norms, \ldots) that can be heavily parallelized and is thus a good candidate to implement a solver for sparse systems on Graphics Processing Units (GPU). This paper presents a GMRES method for such an architecture. It is based on the modified Gram-Schmidt approach and is very similar to that of Sparselib. Our version uses restarting and a very basic preconditioning. For its implementation, we have based our code on CUBLAS and SpMV libraries, in order to achieve a good performance whatever the matrix sizes and their sketch are. Our experiments exhibit encouraging results on the comparison between Central Processing Units (CPU) and GPU executions in double precision, obtaining a speedup ranging from 8 up-to 23 for a large variety of problems

    Accelerating induction machine finite-element simulation with parallel processing

    Get PDF
    Finite element analysis used for detailed electromagnetic analysis and design of electric machines is computationally intensive. A means of accelerating two-dimensional transient finite element analysis, required for induction machine modeling, is explored using graphical processing units (GPUs) for parallel processing. The graphical processing units, widely used for image processing, can provide faster computation times than CPUs alone due to the thousands of small processors that comprise the GPUs. Computations that are suitable for parallel processing using GPUs are calculations that can be decomposed into subsections that are independent and can be computed in parallel and reassembled. The steps and components of the transient finite element simulation are analyzed to determine if using GPUs for calculations can speed up the simulation. The dominant steps of the finite element simulation are preconditioner formation, computation of the sparse iterative solution, and matrix-vector multiplication for magnetic flux density calculation. Due to the sparsity of the finite element problem, GPU-implementation of the sparse iterative solution did not result in faster computation times. The dominant speed-up achieved using the GPUs resulted from matrix-vector multiplication. Simulation results for a benchmark nonlinear magnetic material transient eddy current problem and linear magnetic material transient linear induction machine problem are presented. The finite element analysis program is implemented with MATLAB R2014a to compare sparse matrix format computations to readily available GPU matrix and vector formats and Compute Unified Device Architecture (CUDA) functions linked to MATLAB. Overall speed-up achieved for the simulations resulted in 1.2-3.5 times faster computation of the finite element solution using a hybrid CPU/GPU implementation over the CPU-only implementation. The variation in speed-up is dependent on the sparsity and number of unknowns of the problem

    Uso de arquitecturas MIC para la aceleración de soluciones numéricas en electromagnetismo

    Get PDF
    La mejora en la eficiencia de recursos computacionales para la resolución de problemas electromagnéticos es un tema complejo y de gran interés. La aparición en la última década de GPUs y tarjetas coprocesadoras Xeon Phi en las listas de los supercomputadores con mayor rendimiento, ha llevado a los investigadores a tratar de sacar el máximo provecho de estas nuevas tecnologías. El objetivo principal de esta Tesis es mejorar la eficiencia del método MoM (Method of Moments) mediante la paralelización de algunos de sus algoritmos en procesadores con arquitectura Intel MIC (Many Integrated Core). Para ello, se realiza el modelado de un problema electromagnético mediante la metodología SIE-MoM (Surface Integral Equation-Method of Moments), y se desarrollan nuevos algoritmos para su ejecución en tarjetas coprocesadoras Intel Xeon Phi. Los resultados obtenidos tras evaluar los tiempos de computación comparativamente entre las tarjetas Intel Xeon Phi y las CPUs Intel Xeon, indican que la arquitectura Intel MIC podría resultar adecuada en simulaciones electromagnéticas como complemento a CPUs.Improving the efficiency of computational resources for solving electromagnetic problems is a complex subject of great interest. The growth of GPUs (Graphics Processing Unit) and Xeon Phi coprocessor boards on the lists of top-performing supercomputers over the past decade has led researchers to try to make the most of these new technologies. The main objective of this Thesis is to improve the efficiency of the MoM method by parallelizing some of its algorithms on processors with Intel MIC (Many Integrated Core) architecture. For this purpose, the modeling of an electromagnetic problem is carried out using the SIE-MoM (Surface Integral Equation-Method of Moments) methodology, and new algorithms are developed for their execution on Intel Xeon Phi coprocessor cards. The results obtained after evaluating computation time compared between Intel Xeon Phi cards and Intel Xeon CPUs, indicate that the Intel MIC architecture could be suitable in electromagnetic simulations as a complement to CPUs

    Real-time stress analysis of three-dimensional boundary element problems with continuously updating geometry

    Get PDF
    Computational design of mechanical components is an iterative process that involves multiple stress analysis runs; this can be time consuming and expensive. Significant improvements in the efficiency of this process can be made by increasing the level of interactivity. One approach is through real-time re-analysis of models with continuously updating geometry. In this work the boundary element method is used to realise this vision. Three primary areas need to be considered to accelerate the re-solution of boundary element problems. These are re-meshing the model, updating the boundary element system of equations and re-solution of the system. Once the initial model has been constructed and solved, the user may apply geometric perturbations to parts of the model. A new re-meshing algorithm accommodates these changes in geometry whilst retaining as much of the existing mesh as possible. This allows the majority of the previous boundary element system of equations to be re-used for the new analysis. Efficiency is achieved during re-integration by applying a reusable intrinsic sample point (RISP) integration scheme with a 64-bit single precision code. Parts of the boundary element system that have not been updated are retained by the re-analysis and integrals that multiply zero boundary conditions are suppressed. For models with fewer than 10,000 degrees of freedom, the re-integration algorithm performs up to five times faster than a standard integration scheme with less than 0.15% reduction in the L_2-norm accuracy of the solution vector. The method parallelises easily and an additional six times speed-up can be achieved on eight processors over the serial implementation. The performance of a range of direct, iterative and reduction based linear solvers have been compared for solving the boundary element system with the iterative generalised minimal residual (GMRES) solver providing the fastest convergence rate and the most accurate result. Further time savings are made by preconditioning the updated system with the LU decomposition of the original system. Using these techniques, near real-time analysis can be achieved for three-dimensional simulations; for two-dimensional models such real-time performance has already been demonstrated

    Investigation of general-purpose computing on graphics processing units and its application to the finite element analysis of electromagnetic problems

    Get PDF
    In this dissertation, the hardware and API architectures of GPUs are investigated, and the corresponding acceleration techniques are applied on the traditional frequency domain finite element method (FEM), the element-level time-domain methods, and the nonlinear discontinuous Galerkin method. First, the assembly and the solution phases of the FEM are parallelized and mapped onto the granular GPU processors. Efficient parallelization strategies for the finite element matrix assembly on a single GPU and on multiple GPUs are proposed. The parallelization strategies for the finite element matrix solution, in conjunction with parallelizable preconditioners are investigated to reduce the total solution time. Second, the element-level dual-field domain decomposition (DFDD-ELD) method is parallelized on GPU. The element-level algorithms treat each finite element as a subdomain, where the elements march the fields in time by exchanging fields and fluxes on the element boundary interfaces with the neighboring elements. The proposed parallelization framework is readily applicable to similar element-level algorithms, where the application to the discontinuous Galerkin time-domain (DGTD) methods show good acceleration results. Third, the element-level parallelization framework is further adapted to the acceleration of nonlinear DGTD algorithm, which has potential applications in the field of optics. The proposed nonlinear DGTD algorithm describes the third-order instantaneous nonlinear effect between the electromagnetic field and the medium permittivity. The Newton-Raphson method is incorporated to reduce the number of nonlinear iterations through its quadratic convergence. Various nonlinear examples are presented to show the different Kerr effects observed through the third-order nonlinearity. With the acceleration using MPI+GPU under large cluster environments, the solution times for the various linear and nonlinear examples are significantly reduced

    Yapı mühendisliği için genişletilebilir parelel sonlu elemanlar çözümleme platformu

    Get PDF
    TÜBİTAK MAG Proje01.09.2012The parallel computing systems became more affordable and available in consequence of the recent development in computer technology. Many institutions and engineers, however, can not utilize already available parallel computer hardwares due to the insufficiencies of the structural analysis softwares that they were using. Thus, one of the main objectives of this project is presenting a way to utilize the existing parallel computing hardwares without the need of additional cost and creating a considerable reduction in the analysis times by parallelizing the most frequently utilized finite element analysis techniques in structural engineering. In this project, a sigficant effort was spent on the main analysis methods of finite element method such as linear static, non-linear static, linear and non-linear time history analysis. As paralel solution techniques of linear systems of equations, two different solution approach, i.e. globnal and substructure based were implemented and their performances are tested with several structural models. Likewise, for time history analysis of structures, both implicit and explicit time integration techniques were implemented and their parallel efficiency were tested. Parallel non-linear time history analysis algoritms were also implemented utilizing the explicit integration technique. One of the main problems of developing a computational mechanics software is the difficulty of having the third parties other than the developers to use and further develop such softwares. Because of this reason, most of the academical softwares were being utilized only by a few researchers. Thus, the other important target of this project is to create an expandable software structure so that the framework can easily be utilized and further developed by other researchers. For this reason, an objectoriented data structure was carefully designed for such an analysis software and with the help of the state of the art ‘plug-in’ technolgy, external programs can be easily added to the analysis engine and utilized without any problems. In order to validate the extensibility of the developed analysis framework, finite elements and analysis methods for the heat transfer problems were developed and added to the framework as plug-ins. As a final step, the use of GP-GPU’s in finite element analysis were examined by developing several analysis methods. Even though fast solution times for direct sparse matrix solvers were not obtained when compared to the performance of multi-core CPUs, significant reduction in solution times for dense matrix operations and explicit time integration methods were obtained
    corecore