310 research outputs found

    Parallel Simulation for VLSI Power Grid

    Get PDF
    Due to the increasing complexity of VLSI circuits, power grid simulation has become more and more time-consuming. Hence, there is a need for fast and accurate power grid simulator. In order to perform power grid simulation in a timely manner, parallel algorithms have been developed to accelerate the simulation. In this dissertation, we present parallel algorithms and software for power grid simulation on CPU-GPU platforms. The power grid is divided into disjoint partitions. The partitions are enlarged using Breath First Search (BFS) method. In the partition enlarging process, a portion of edges are ignored to make the matrix factorization light-weight. Solving the enlarged partitions using a direct solver serves as a preconditioner for the Preconditioned Conjugate Gradient (PCG) method that is used to solve the power grid. This work combines the advantages of direct solvers and iterative solvers to obtain an efficient hybrid parallel solver. Two-tier parallelism is harnessed using MPI for partitions and CUDA within each partition. The experiments conducted on supercomputing clusters demonstrate significant speed improvements over a state-of-the-art direct solver in both static and transient analysis

    Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond

    Full text link
    In this and a set of companion whitepapers, the USQCD Collaboration lays out a program of science and computing for lattice gauge theory. These whitepapers describe how calculation using lattice QCD (and other gauge theories) can aid the interpretation of ongoing and upcoming experiments in particle and nuclear physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers

    Multilevel Variable-Block Schur-Complement-Based Preconditioning for the Implicit Solution of the Reynolds- Averaged Navier-Stokes Equations Using Unstructured Grids

    Get PDF
    Implicit methods based on the Newton’s rootfinding algorithm are receiving an increasing attention for the solution of complex Computational Fluid Dynamics (CFD) applications due to their potential to converge in a very small number of iterations. This approach requires fast convergence acceleration techniques in order to compete with other conventional solvers, such as those based on artificial dissipation or upwind schemes, in terms of CPU time. In this chapter, we describe a multilevel variable-block Schur-complement-based preconditioning for the implicit solution of the Reynolds-averaged Navier-Stokes equations using unstructured grids on distributed-memory parallel computers. The proposed solver detects automatically exact or approximate dense structures in the linear system arising from the discretization, and exploits this information to enhance the robustness and improve the scalability of the block factorization. A complete study of the numerical and parallel performance of the solver is presented for the analysis of turbulent Navier-Stokes equations on a suite of three-dimensional test cases

    Multi-solver schemes for electromagnetic modeling of large and complex objects

    Get PDF
    The work in this dissertation primarily focuses on the development of numerical algorithms for electromagnetic modeling of large and complex objects. First, a GPU-accelerated multilevel fast multipole algorithm (MLFMA) is presented to improve the efficiency of the traditional MLFMA by taking advantage of GPU hardware advancement. The proposed hierarchical parallelization strategy ensures a high computational throughput for the GPU calculation. The resulting OpenMP-based multi-GPU implementation is capable of solving real-life problems with over one million unknowns with a remarkable speedup. The radar cross sections (RCS) of a few benchmark objects are calculated to demonstrate the accuracy of the solution. The results are compared with those from the CPU-based MLFMA and measurements. The capability and efficiency of the presented method are analyzed through the examples of a sphere, an aircraft, and a missile-like object. Compared with the 8-threaded CPU-based MLFMA, the OpenMP-CUDA-MLFMA method can achieve from 5 to 20 times total speedup. Second, an efficient and accurate finite element--boundary integral (FE-BI) method is proposed for solving electromagnetic scattering and radiation problems. A mixed testing scheme, in which the Rao-Wilton-Glisson and the Buffa-Christiansen functions are both employed as the testing functions, is first presented to improve the accuracy of the FE-BI method. An efficient absorbing boundary condition (ABC)-based preconditioner is then proposed to accelerate the convergence of the iterative solution. To further improve the efficiency of the total computation, a GPU-accelerated MLFMA is applied to the iterative solution. The RCSs of several benchmark objects are calculated to demonstrate the numerical accuracy of the solution and also to show that the proposed method not only is free of interior resonance corruption, but also has a better convergence than the conventional FE-BI methods. The capability and efficiency of the proposed method are analyzed through several numerical examples, including a large dielectric coated sphere, a partial human body, and a coated missile-like object. Compared with the 8-threaded CPU-based algorithm, the GPU-accelerated FE-BI-MLFMA algorithm can achieve a total speedup up to 25.5 times. Third, a multi-solver (MS) scheme based on combined field integral equation (CFIE) is proposed. In this scheme, an object is decomposed into multiple bodies based on its material property and geometry. To model bodies with complicated materials, the FE-BI method is applied. To model bodies with homogeneous or conducting materials, the method of moments is employed. Specifically, three solvers are integrated in this multi-solver scheme: the FE-BI(CFIE) for inhomogeneous objects, the CFIE for dielectric objects, and the CFIE for conducting objects. A mixed testing scheme that utilizes both the Rao-Wilton-Glisson and the Buffa-Christiansen functions is adopted to obtain a good accuracy of the proposed multi-solver algorithm. In the iterative solution of the combined system, the MLFMA is applied to accelerate computation and reduce memory costs, and an ABC-based preconditioner is employed to speed up the convergence. In the numerical examples, the individual solvers are first demonstrated to be well conditioned and highly accurate. Then the validity of the proposed multi-solver scheme is demonstrated and its capability is shown by solving scattering problems of electrically large missile-like objects. Fourth, a MS scheme based on Robin transmission condition (RTC) is proposed. Different from the FE-BI method that applies BI equations to truncate the FE domain, this proposed multi-solver scheme employs both FE and BI equations to model an object along with its background. To be specific, the entire computational domain consisting of the object and its background is first decomposed into multiple non-overlapping subdomains with each modeled by either an FE or BI equation. The equations in the subdomains are then coupled into a multi-solver system by enforcing the RTC at the subdomain interfaces. Finally, the combined system is solved iteratively with the application of an extended ABC-based preconditioner and the MLFMA. To obtain an accurate solution, both the Rao-Wilton-Glisson and the Buffa-Christiansen functions are employed as the testing functions to discretize the BI equations. This scheme is applied to a variety of benchmark problems and the scattering from an aircraft with a launched missile to demonstrate its accuracy, versatility, and capability. The proposed scheme is compared with the MS-CFIE to illustrate the differences between the two schemes. Fifth, to further improve the modeling capability, an accelerated MS method is developed on distributed computing systems to simulate the scattering from very large and complex objects. The parallelization strategy is to parallelize different subdomains individually, which is different from the parallelized domain decomposition methods, where the subdomains are handled in parallel. The multilevel fast multipole algorithm is parallelized to enable computation on many processors. The modeling strategy using the MS-RTC method is also discussed so that one can easily follow the guideline to model large and complex objects. Numerical examples are given to show the parallel efficiency of the proposed strategy and the modeling capability of the proposed method. Finally, the specific absorption rate (SAR) in a human head at 5G frequencies is simulated by taking advantage of the MS-RTC method. Based on the strong skin effect, the human head model is first simplified to reduce the computation cost. Then the MS-RTC method is applied to model the human head. Numerical examples show that the MS method is very efficient in solving electromagnetic fields in the human head and the simplified human head model can be used in the SAR simulation with an acceptable accuracy

    Time-domain and harmonic balance turbulent Navier-Stokes analysis of wind turbine aerodynamics using a fully coupled low-speed preconditioned multigrid solver

    Get PDF
    The research work reported in this thesis stems from the development of an accurate and computationally efficient Reynolds-Averaged Navier-Stokes (RANS) research code, with a particular emphasis on the steady and unsteady aerodynamics analysis of complex low speed turbulent flows. Such turbulent flow problems include horizontal axis wind turbine (HAWT) and vertical axis wind turbine (VAWT) operating at design and off-design conditions. On the algorithmic side, the main contribution of this research is the successful development of a rigorous novel approach to low-speed preconditioning (LSP) for the multigrid fully coupled integration of the steady, time-domain and harmonic balance RANS equations coupled to the two-equation shear stress transport (SST) turbulence model. The design of the LSP implementation is such that each part of the code affected by LSP can be validated individually against the baseline solver by suitably specifying one numerical input parameter of the LSP-enhanced code. The thesis has investigated several important issues on modelling and numerical aspects which are seldom thoroughly analysed in the computational fluid dynamics problems of the type presented herein. The first and most important modelling issue is the necessity of applying the low speed preconditioning to both RANS and SST equations and maintaining the turbulent kinetic energy in the definition of the total energy, which, to the best knowledge of author, has never been seen in any published literature so far. Based on the results obtained in the analysis of the vertical axis wind turbine application, we have demonstrated that by preconditioning the SST turbulence equations, one can significantly improve the convergence rate; and keeping the turbulence kinetic energy in the total energy has a great positive effect on the solution accuracy. The other modelling issue to be analysed is the sensitivity of the flow solution to the farfield boundary conditions, particularly for low speed problems. The analyses reported in the thesis highlight that with a small size of the computational domain, the preconditioned farfield boundary conditions are crucial to improve the solution accuracy. As for the numerical aspects, we analyse the impact of using the relative velocity to build the preconditioning parameter on the flow solutions of an unsteady moving-grid problem. The presented results demonstrate that taking into account the grid motion in building the preconditioning parameter can achieve a noticeable enhancement of the solution accuracy. On the other hand, the nonlinear frequency-domain harmonic balance approach is a fairly new technology to solve the unsteady RANS equations, which yields significant reduction of the run-time required to achieve periodic flows with respect to the conventional time-domain approach. And the implementation of the LSP approach into the turbulent harmonic balance RANS and SST formulations is another main novelty presented herein, which is also the first published research work on this aspect. The newly developed low speed turbulent flow predictive capabilities are comprehensively validated in a wide range of tests varying from subsonic flow with slight compressibility to user-defined extremely low speed incompressible flows. The solutions of our research code with LSP technology are compared with experiment data, theoretical solutions and numerical solutions of the state-of-the-art CFD research code and commercial package. The main computational results of this research consist of the analyses of HAWT and VAWT applications. The first one is a comparative analysis of 30% and 93.5% blade sections of a VESTAS multi-megawatt HAWT working in various regimes. The steady, time-domain and frequency-domain results obtained with the LSP solver are used to analyse in great detail the steady and unsteady aerodynamic characteristics in those regimes. The main motivation is to highlight the predictive capabilities and the numerical robustness of the LSP-enhanced turbulent steady, time-domain and frequency domain flow solvers for realistic complex and even more challenging problems, to quantify the effects of flow compressibility on the steady and yawed wind-induced unsteady aerodynamics in the tip region of a 82-m HAWT blade in rated operating condition, and to assess the computational benefits achieved by using the harmonic balance method rather than the conventional time-domain method. The second application is the comparative aerodynamic analyses of the NREL 5MW HAWT working in the inviscid steady flow condition. The main motivation of this analysis is to further demonstrate the predictive capabilities of the LSP solver to simulate the threedimensional wind turbine flows. The last application is the time-domain turbulent flow analysis of the VAWT to the aim of demonstrating the accuracy enhancement of the LSP solver for this particular problem, the necessity of applying the full preconditioning strategy, the important effect of the turbulent kinetic energy on the solution accuracy and the proper implementation of the preconditioning parameter required for an accurate numerical solution to an unsteady moving grid low-speed problem

    Domain decomposition preconditioning for the Helmholtz equation: a coarse space based on local Dirichlet-to-Neumann maps

    Get PDF
    In this thesis, we present a two-level domain decomposition method for the iterative solution of the heterogeneous Helmholtz equation. The Helmholtz equation governs wave propagation and scattering phenomena arising in a wide range of engineering applications. Its discretization with piecewise linear finite elements results in typically large, ill-conditioned, indefinite, and non- Hermitian linear systems of equations, for which standard iterative and direct methods encounter convergence problems. Therefore, especially designed methods are needed. The inherently parallel domain decomposition methods constitute a promising class of preconditioners, as they subdivide the large problems into smaller subproblems and are hence able to cope with many degrees of freedom. An essential element of these methods is a good coarse space. Here, the Helmholtz equation presents a particular challenge, as even slight deviations from the optimal choice can be fatal. We develop a coarse space that is based on local eigenproblems involving the Dirichlet-to-Neumann operator. Our construction is completely automatic, ensuring good convergence rates without the need for parameter tuning. Moreover, it naturally respects local variations in the wave number and is hence suited also for heterogeneous Helmholtz problems. Apart from the question of how to design the coarse space, we also investigate the question of how to incorporate the coarse space into the method. Also here the fact that the stiffness matrix is non-Hermitian and indefinite constitutes a major challenge. The resulting method is parallel by design and its efficiency is investigated for two- and three-dimensional homogeneous and heterogeneous numerical examples
    corecore