17 research outputs found

    Parallel computation of 3-D electromagnetic scattering using finite elements

    Full text link
    The finite element method (FEM) with local absorbing boundary conditions has been recently applied to compute electromagnetic scattering from large 3-D geometries. In this paper, we present details pertaining to code implementation and optimization. Various types of sparse matrix storage schemes are discussed and their performance is examined in terms of vectorization and net storage requirements. The system of linear equations is solved using a preconditioned biconjugate gradient (BCG) algorithm and a fairly detailed study of existing point and block preconditioners (diagonal and incomplete LU) is carried out. A modified ILU preconditioning scheme is also introducted which works better than the traditional version for our matrix systems. The parallelization of the iterative sparse solver and the matrix generation/assembly as implemented on the KSR1 multiprocessor is described and the interprocessor communication patterns are analysed in detail. Near-linear speed-up is obtained for both the iterative solver and the matrix generation/assembly phases. Results are presented for a problem having 224,476 unknowns and validated by comparison with measured data.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/50413/1/1660070504_ftp.pd

    Parallel Algorithms for Forward and Back Substitution in Linear Algebraic Equations of Finite Element Method, Journal of Telecommunications and Information Technology, 2019, nr 4

    Get PDF
    This paper considers several algorithms for parallelizing the procedure of forward and back substitution for high-order symmetric sparse matrices on multi-core computers with shared memory. It compares the proposed approaches for various finite-element problems of structural mechanics which generate sparse matrices of different structure

    Parallel implementation of the finite element method on shared memory multiprocessors

    Get PDF
    PhD ThesisThe work presented in this thesis concerns parallel methods for finite element analysis. The research has been funded by British Gas and some of the presented material involves work on their software. Practical problems involving the finite element method can use a large amount of processing power and the execution times can be very large. It is consequently important to investigate the possibilities for the parallel implementation of the method. The research has been carried out on an Encore Multimax, a shared memory multiprocessor with 14 identical CPU's. We firstly experimented on autoparallelising a large British Gas finite element program (GASP4) using Encore's parallelising Fortran compiler (epf). The par- allel program generated by epj proved not to be efficient. The main reasons are the complexity of the code and small grain parallelism. Since the program is hard to analyse for the compiler at high levels, only small grain parallelism has been inserted automatically into the code. This involves a great deal of low level syn- chronisations which produce large overheads and cause inefficiency. A detailed analysis of the autoparallelised code has been made with a view to determining the reasons for the inefficiency. Suggestions have also been made about writing programs such that they are suitable for efficient autoparallelisation. The finite element method consists of the assembly of a stiffness matrix and the solution of a set of simultaneous linear equations. A sparse representation of the stiffness matrix has been used to allow experimentation on large problems. Parallel assembly techniques for the sparse representation have been developed. Some of these methods have proved to be very efficient giving speed ups that are near ideal. For the solution phase, we have used the preconditioned conjugate gradient method (PCG). An incomplete LU factorization ofthe stiffness matrix with no fill- in (ILU(O)) has been found to be an effective preconditioner. The factors can be obtained at a low cost. We have parallelised all the steps of the PCG method. The main bottleneck is the triangular solves (preconditioning operations) at each step. Two parallel methods of triangular solution have been implemented. One is based on level scheduling (row-oriented parallelism) and the other is a new approach called independent columns (column-oriented parallelism). The algorithms have been tested for row and red-black orderings of the nodal unknowns in the finite element meshes considered. The best speed ups obtained are 7.29 (on 12 processors) for level scheduling and 7.11 (on 12 processors) for independent columns. Red-black ordering gives rise to better parallel performance than row ordering in general. An analysis of methods for the improvement of the parallel efficiency has been made.British Ga

    Application of HPC in eddy current electromagnetic problem solution

    Get PDF
    As engineering problems are becoming more and more advanced, the size of an average model solved by partial differential equations is rapidly growing and, in order to keep simulation times within reasonable bounds, both faster computers and more efficient software implementations are needed. In the first part of this thesis, the full potential of simulation software has been exploited through high performance parallel computing techniques. In particular, the simulation of induction heating processes is accomplished within reasonable solution times, by implementing different parallel direct solvers for large sparse linear system, in the solution process of a commercial software. The performance of such library on shared memory systems has been remarkably improved by implementing a multithreaded version of MUMPS (MUltifrontal Massively Parallel Solver) library, which have been tested on benchmark matrices arising from typical induction heating process simulations. A new multithreading approach and a low rank approximation technique have been implemented and developed by MUMPS team in Lyon and Toulouse. In the context of a collaboration between MUMPS team and DII-University of Padova, a preliminary version of such functionalities could be tested on induction heating benchmark problems, and a substantial reduction of the computational cost and memory requirements could be achieved. In the second part of this thesis, some examples of design methodology by virtual prototyping have been described. Complex multiphysics simulations involving electromagnetic, circuital, thermal and mechanical problems have been performed by exploiting parallel solvers, as developed in the first part of this thesis. Finally, multiobjective stochastic optimization algorithms have been applied to multiphysics 3D model simulations in search of a set of improved induction heating device configurations

    Accelerating advanced preconditioning methods on hybrid architectures

    Get PDF
    Un gran número de problemas, en diversas áreas de la ciencia y la ingeniería, involucran la solución de sistemas dispersos de ecuaciones lineales de gran escala. En muchos de estos escenarios, son además un cuello de botella desde el punto de vista computacional, y por esa razón, su implementación eficiente ha motivado una cantidad enorme de trabajos científicos. Por muchos años, los métodos directos basados en el proceso de la Eliminación Gaussiana han sido la herramienta de referencia para resolver dichos sistemas, pero la dimensión de los problemas abordados actualmente impone serios desafíos a la mayoría de estos algoritmos, considerando sus requerimientos de memoria, su tiempo de cómputo y la complejidad de su implementación. Propulsados por los avances en las técnicas de precondicionado, los métodos iterativos se han vuelto más confiables, y por lo tanto emergen como alternativas a los métodos directos, ofreciendo soluciones de alta calidad a un menor costo computacional. Sin embargo, estos avances muchas veces son relativos a un problema específico, o dotan a los precondicionadores de una complejidad tal, que su aplicación en diversos problemas se vuelve poco práctica en términos de tiempo de ejecución y consumo de memoria. Como respuesta a esta situación, es común la utilización de estrategias de Computación de Alto Desempeño, ya que el desarrollo sostenido de las plataformas de hardware permite la ejecución simultánea de cada vez más operaciones. Un claro ejemplo de esta evolución son las plataformas compuestas por procesadores multi-núcleo y aceleradoras de hardware como las Unidades de Procesamiento Gráfico (GPU). Particularmente, las GPU se han convertido en poderosos procesadores paralelos, capaces de integrar miles de núcleos a precios y consumo energético razonables.Por estas razones, las GPU son ahora una plataforma de hardware de gran importancia para la ciencia y la ingeniería, y su uso eficiente es crucial para alcanzar un buen desempeño en la mayoría de las aplicaciones. Esta tesis se centra en el uso de GPUs para acelerar la solución de sistemas dispersos de ecuaciones lineales usando métodos iterativos precondicionados con técnicas modernas. En particular, se trabaja sobre ILUPACK, que ofrece implementaciones de los métodos iterativos más importantes, y presenta un interesante y moderno precondicionador de tipo ILU multinivel. En este trabajo, se desarrollan versiones del precondicionador y de los métodos incluidos en el paquete, capaces de explotar el paralelismo de datos mediante el uso de GPUs sin afectar las propiedades numéricas del precondicionador. Además, se habilita y analiza el uso de las GPU en versiones paralelas existentes, basadas en paralelismo de tareas para plataformas de memoria compartida y distribuida. Los resultados obtenidos muestran una sensible mejora en el tiempo de ejecución de los métodos abordados, así como la posibilidad de resolver problemas de gran escala de forma eficiente

    Doctor of Philosophy

    Get PDF
    dissertationSparse matrix codes are found in numerous applications ranging from iterative numerical solvers to graph analytics. Achieving high performance on these codes has however been a significant challenge, mainly due to array access indirection, for example, of the form A[B[i]]. Indirect accesses make precise dependence analysis impossible at compile-time, and hence prevent many parallelizing and locality optimizing transformations from being applied. The expert user relies on manually written libraries to tailor the sparse code and data representations best suited to the target architecture from a general sparse matrix representation. However libraries have limited composability, address very specific optimization strategies, and have to be rewritten as new architectures emerge. In this dissertation, we explore the use of the inspector/executor methodology to accomplish the code and data transformations to tailor high performance sparse matrix representations. We devise and embed abstractions for such inspector/executor transformations within a compiler framework so that they can be composed with a rich set of existing polyhedral compiler transformations to derive complex transformation sequences for high performance. We demonstrate the automatic generation of inspector/executor code, which orchestrates code and data transformations to derive high performance representations for the Sparse Matrix Vector Multiply kernel in particular. We also show how the same transformations may be integrated into sparse matrix and graph applications such as Sparse Matrix Matrix Multiply and Stochastic Gradient Descent, respectively. The specific constraints of these applications, such as problem size and dependence structure, necessitate unique sparse matrix representations that can be realized using our transformations. Computations such as Gauss Seidel, with loop carried dependences at the outer most loop necessitate different strategies for high performance. Specifically, we organize the computation into level sets or wavefronts of irregular size, such that iterations of a wavefront may be scheduled in parallel but different wavefronts have to be synchronized. We demonstrate automatic code generation of high performance inspectors that do explicit dependence testing and level set construction at runtime, as well as high performance executors, which are the actual parallelized computations. For the above sparse matrix applications, we automatically generate inspector/executor code comparable in performance to manually tuned libraries

    JTIT

    Get PDF
    kwartalni
    corecore