117 research outputs found

    Numerical aspects of the PUFEM for efficient solution of Helmholtz problems

    Get PDF
    Conventional finite element methods (FEM) have been used for many years for the solution of harmonic wave problems. To ensure accurate simulation, each wavelength is discretised into around ten nodal points, with the finite element mesh being updated for each frequency to maintain adequate resolution of the wave pattern. This technique works well when the wavelength is long or the model domain is small. However, when the converse applies and the wavelength is small or the domain of interest is large, the finite element mesh requires a large number of elements, and the procedure becomes computationally expensive and impractical. The principal objective of this work is to accurately model two-dimensional Helmholtz problems with the Partition of Unity Finite Element Method (PUFEM). This will be achieved by applying the plane wave basis decomposition to the wave field. These elements allow us to relax the traditional requirement of around ten nodal points per wavelength and therefore solve Helmholtz wave problems without refining the mesh of the computational domain at each frequency. Various numerical aspects affecting the efficiency of the PUFEM are analysed in order to improve its potential. The accuracy and effectiveness of the method are investigated by comparing solutions for selected problems with available analytical solutions or to high resolution numerical solutions using conventional finite elements. First, the use of plane waves or cylindrical waves in the enrichment process is assessed for wave scattering problems involving a rigid circular cylinder in both near field and far field. In the far field, the cylindrical waves proved to be more effective in reducing the computational effort. But given that the plane waves are simpler to analytically integrate for straight edge elements, during the finite element assembling process, they are retained for the remaining of the thesis. The analysed numerical aspects, which may affect the PUFEM performance, include the conjugated and unconjugated weighting, the geometry description, the use of non-reflecting boundary conditions, and the h-, p- and q-convergence. To speed up the element assembling process at high wave numbers, an exact integration procedure is implemented. The PUFEM is also assessed on multiple scattering problems, involving sets of circular cylinders, and on exterior wave problems presenting singularities in the geometry of the scatterer. Large and small elements, in comparison to the wavelength, are used with both constant and variable numbers of enriching plane waves. Last, the PUFEM resulting system is iteratively solved by using an incomplete lower and upper based preconditioner. To further enhance the efficiency of the iterative solution, the resulting system is solved into the wavelet domain. Overall, compared to the FEM, the PUFEM leads to drastic reduction of the total number of degrees of freedom required to solve a wave problem. It also leads to very good performance when large elements, compared to the wavelength, are used with high numbers of enriching plane waves, rather than small elements with low numbers of plane waves. Due to geometry detail description, it is practical to use both large and small elements. In this case, to keep the conditioning within acceptable limits it is necessary to vary the number of enriching plane waves with the element size

    Investigation of general-purpose computing on graphics processing units and its application to the finite element analysis of electromagnetic problems

    Get PDF
    In this dissertation, the hardware and API architectures of GPUs are investigated, and the corresponding acceleration techniques are applied on the traditional frequency domain finite element method (FEM), the element-level time-domain methods, and the nonlinear discontinuous Galerkin method. First, the assembly and the solution phases of the FEM are parallelized and mapped onto the granular GPU processors. Efficient parallelization strategies for the finite element matrix assembly on a single GPU and on multiple GPUs are proposed. The parallelization strategies for the finite element matrix solution, in conjunction with parallelizable preconditioners are investigated to reduce the total solution time. Second, the element-level dual-field domain decomposition (DFDD-ELD) method is parallelized on GPU. The element-level algorithms treat each finite element as a subdomain, where the elements march the fields in time by exchanging fields and fluxes on the element boundary interfaces with the neighboring elements. The proposed parallelization framework is readily applicable to similar element-level algorithms, where the application to the discontinuous Galerkin time-domain (DGTD) methods show good acceleration results. Third, the element-level parallelization framework is further adapted to the acceleration of nonlinear DGTD algorithm, which has potential applications in the field of optics. The proposed nonlinear DGTD algorithm describes the third-order instantaneous nonlinear effect between the electromagnetic field and the medium permittivity. The Newton-Raphson method is incorporated to reduce the number of nonlinear iterations through its quadratic convergence. Various nonlinear examples are presented to show the different Kerr effects observed through the third-order nonlinearity. With the acceleration using MPI+GPU under large cluster environments, the solution times for the various linear and nonlinear examples are significantly reduced

    The Unified-FFT Method for Fast Solution of Integral Equations as Applied to Shielded-Domain Electromagnetics

    Get PDF
    Electromagnetic (EM) solvers are widely used within computer-aided design (CAD) to improve and ensure success of circuit designs. Unfortunately, due to the complexity of Maxwell\u27s equations, they are often computationally expensive. While considerable progress has been made in the realm of speed-enhanced EM solvers, these fast solvers generally achieve their results through methods that introduce additional error components by way of geometric approximations, sparse-matrix approximations, multilevel decomposition of interactions, and more. This work introduces the new method, Unified-FFT (UFFT). A derivative of method of moments, UFFT scales as O(N log N), and achieves fast analysis by the unique combination of FFT-enhanced matrix fill operations (MFO) with FFT-enhanced matrix solve operations (MSO). In this work, two versions of UFFT are developed, UFFT-Precorrected (UFFT-P) and UFFT-Grid Totalizing (UFFT-GT). UFFT-P uses precorrected FFT for MSO and allows the use of basis functions that do not conform to a regular grid. UFFT-GT uses conjugate gradient FFT for MSO and features the capability of reducing the error of the solution down to machine precision. The main contribution of UFFT-P is a fast solver, which utilizes FFT for both MFO and MSO. It is demonstrated in this work to not only provide simulation results for large problems considerably faster than state of the art commercial tools, but also to be capable of simulating geometries which are too complex for conventional simulation. In UFFT-P these benefits come at the expense of a minor penalty to accuracy. UFFT-GT contains further contributions as it demonstrates that such a fast solver can be accurate to numerical precision as compared to a full, direct analysis. It is shown to provide even more algorithmic efficiency and faster performance than UFFT-P. UFFT-GT makes an additional contribution in that it is developed not only for planar geometries, but also for the case of multilayered dielectrics and metallization. This functionality is particularly useful for multi-layered printed circuit boards (PCBs) and integrated circuits (ICs). Finally, UFFT-GT contributes a 3D planar solver, which allows for current to be discretized in the z-direction. This allows for similar fast and accurate simulation with the inclusion of some 3D features, such as vias connecting metallization planes

    Multi-solver schemes for electromagnetic modeling of large and complex objects

    Get PDF
    The work in this dissertation primarily focuses on the development of numerical algorithms for electromagnetic modeling of large and complex objects. First, a GPU-accelerated multilevel fast multipole algorithm (MLFMA) is presented to improve the efficiency of the traditional MLFMA by taking advantage of GPU hardware advancement. The proposed hierarchical parallelization strategy ensures a high computational throughput for the GPU calculation. The resulting OpenMP-based multi-GPU implementation is capable of solving real-life problems with over one million unknowns with a remarkable speedup. The radar cross sections (RCS) of a few benchmark objects are calculated to demonstrate the accuracy of the solution. The results are compared with those from the CPU-based MLFMA and measurements. The capability and efficiency of the presented method are analyzed through the examples of a sphere, an aircraft, and a missile-like object. Compared with the 8-threaded CPU-based MLFMA, the OpenMP-CUDA-MLFMA method can achieve from 5 to 20 times total speedup. Second, an efficient and accurate finite element--boundary integral (FE-BI) method is proposed for solving electromagnetic scattering and radiation problems. A mixed testing scheme, in which the Rao-Wilton-Glisson and the Buffa-Christiansen functions are both employed as the testing functions, is first presented to improve the accuracy of the FE-BI method. An efficient absorbing boundary condition (ABC)-based preconditioner is then proposed to accelerate the convergence of the iterative solution. To further improve the efficiency of the total computation, a GPU-accelerated MLFMA is applied to the iterative solution. The RCSs of several benchmark objects are calculated to demonstrate the numerical accuracy of the solution and also to show that the proposed method not only is free of interior resonance corruption, but also has a better convergence than the conventional FE-BI methods. The capability and efficiency of the proposed method are analyzed through several numerical examples, including a large dielectric coated sphere, a partial human body, and a coated missile-like object. Compared with the 8-threaded CPU-based algorithm, the GPU-accelerated FE-BI-MLFMA algorithm can achieve a total speedup up to 25.5 times. Third, a multi-solver (MS) scheme based on combined field integral equation (CFIE) is proposed. In this scheme, an object is decomposed into multiple bodies based on its material property and geometry. To model bodies with complicated materials, the FE-BI method is applied. To model bodies with homogeneous or conducting materials, the method of moments is employed. Specifically, three solvers are integrated in this multi-solver scheme: the FE-BI(CFIE) for inhomogeneous objects, the CFIE for dielectric objects, and the CFIE for conducting objects. A mixed testing scheme that utilizes both the Rao-Wilton-Glisson and the Buffa-Christiansen functions is adopted to obtain a good accuracy of the proposed multi-solver algorithm. In the iterative solution of the combined system, the MLFMA is applied to accelerate computation and reduce memory costs, and an ABC-based preconditioner is employed to speed up the convergence. In the numerical examples, the individual solvers are first demonstrated to be well conditioned and highly accurate. Then the validity of the proposed multi-solver scheme is demonstrated and its capability is shown by solving scattering problems of electrically large missile-like objects. Fourth, a MS scheme based on Robin transmission condition (RTC) is proposed. Different from the FE-BI method that applies BI equations to truncate the FE domain, this proposed multi-solver scheme employs both FE and BI equations to model an object along with its background. To be specific, the entire computational domain consisting of the object and its background is first decomposed into multiple non-overlapping subdomains with each modeled by either an FE or BI equation. The equations in the subdomains are then coupled into a multi-solver system by enforcing the RTC at the subdomain interfaces. Finally, the combined system is solved iteratively with the application of an extended ABC-based preconditioner and the MLFMA. To obtain an accurate solution, both the Rao-Wilton-Glisson and the Buffa-Christiansen functions are employed as the testing functions to discretize the BI equations. This scheme is applied to a variety of benchmark problems and the scattering from an aircraft with a launched missile to demonstrate its accuracy, versatility, and capability. The proposed scheme is compared with the MS-CFIE to illustrate the differences between the two schemes. Fifth, to further improve the modeling capability, an accelerated MS method is developed on distributed computing systems to simulate the scattering from very large and complex objects. The parallelization strategy is to parallelize different subdomains individually, which is different from the parallelized domain decomposition methods, where the subdomains are handled in parallel. The multilevel fast multipole algorithm is parallelized to enable computation on many processors. The modeling strategy using the MS-RTC method is also discussed so that one can easily follow the guideline to model large and complex objects. Numerical examples are given to show the parallel efficiency of the proposed strategy and the modeling capability of the proposed method. Finally, the specific absorption rate (SAR) in a human head at 5G frequencies is simulated by taking advantage of the MS-RTC method. Based on the strong skin effect, the human head model is first simplified to reduce the computation cost. Then the MS-RTC method is applied to model the human head. Numerical examples show that the MS method is very efficient in solving electromagnetic fields in the human head and the simplified human head model can be used in the SAR simulation with an acceptable accuracy

    Scalable domain decomposition methods for finite element approximations of transient and electromagnetic problems

    Get PDF
    The main object of study of this thesis is the development of scalable and robust solvers based on domain decomposition (DD) methods for the linear systems arising from the finite element (FE) discretization of transient and electromagnetic problems. The thesis commences with a theoretical review of the curl-conforming edge (or Nédélec) FEs of the first kind and a comprehensive description of a general implementation strategy for h- and p- adaptive elements of arbitrary order on tetrahedral and hexahedral non-conforming meshes. Then, a novel balancing domain decomposition by constraints (BDDC) preconditioner that is robust for multi-material and/or heterogeneous problems posed in curl-conforming spaces is presented. The new method, in contrast to existent approaches, is based on the definition of the ingredients of the preconditioner according to the physical coefficients of the problem and does not require spectral information. The result is a robust and highly scalable preconditioner that preserves the simplicity of the original BDDC method. When dealing with transient problems, the time direction offers itself an opportunity for further parallelization. Aiming to design scalable space-time solvers, first, parallel-in-time parallel methods for linear and non-linear ordinary differential equations (ODEs) are proposed, based on (non-linear) Schur complement efficient solvers of a multilevel partition of the time interval. Then, these ideas are combined with DD concepts in order to design a two-level preconditioner as an extension to space-time of the BDDC method. The key ingredients for these new methods are defined such that they preserve the time causality, i.e., information only travels from the past to the future. The proposed schemes are weakly scalable in time and space-time, i.e., one can efficiently exploit increasing computational resources to solve more time steps in (approximately) the same time-to-solution. All the developments presented herein are motivated by the driving application of the thesis, the 3D simulation of the low-frequency electromagnetic response of High Temperature Superconductors (HTS). Throughout the document, an exhaustive set of numerical experiments, which includes the simulation of a realistic 3D HTS problem, is performed in order to validate the suitability and assess the parallel performance of the High Performance Computing (HPC) implementation of the proposed algorithms.L’objecte principal d’estudi d’aquesta tesi és el desenvolupament de solucionadors escalables i robustos basats en mètodes de descomposició de dominis (DD) per a sistemes lineals que sorgeixen en la discretització mitjançant elements finits (FE) de problemes transitoris i electromagnètics. La tesi comença amb una revisió teòrica dels FE d’eix (o de Nédélec) de la primera família i una descripció exhaustiva d’una estratègia d’implementació general per a elements h- i p-adaptatius d’ordre arbitrari en malles de tetraedres i hexaedres noconformes. Llavors, es presenta un nou precondicionador de descomposició de dominis balancejats per restricció (BDDC) que és robust per a problemes amb múltiples materials i/o heterogenis definits en espais curl-conformes. El nou mètode, en contrast amb els enfocaments existents, està basat en la definició dels ingredients del precondicionador segons els coeficients físics del problema i no requereix informació espectral. El resultat és un precondicionador robust i escalable que preserva la simplicitat del mètode original BDDC. Quan tractem amb problemes transitoris, la direcció temporal ofereix ella mateixa l’oportunitat de seguir explotant paral·lelisme. Amb l’objectiu de dissenyar precondicionadors en espai-temps, primer, proposem solucionadors paral·lels en temps per equacions diferencials lineals i no-lineals, basats en un solucionador eficient del complement de Schur d’una partició multinivell de l’interval de temps. Seguidament, aquestes idees es combinen amb conceptes de DD amb l’objectiu de dissenyar precondicionadors com a extensió a espai-temps dels mètodes de BDDC. Els ingredients clau d’aquests nous mètodes es defineixen de tal manera que preserven la causalitat del temps, on la informació només viatja de temps passats a temps futurs. Els esquemes proposats són dèbilment escalables en temps i en espai-temps, és a dir, es poden explotar eficientment recursos computacionals creixents per resoldre més passos de temps en (aproximadament) el mateix temps transcorregut de càlcul. Tots els desenvolupaments presentats aquí són motivats pel problema d’aplicació de la tesi, la simulació de la resposta electromagnètica de baixa freqüència dels superconductors d’alta temperatura (HTS) en 3D. Al llarg del document, es realitza un conjunt exhaustiu d’experiments numèrics, els quals inclouen la simulació d’un problema de HTS realista en 3D, per validar la idoneïtat i el rendiment paral·lel de la implementació per a computació d’alt rendiment dels algorismes proposatsPostprint (published version
    corecore