    A Parallel Scalable PETSc-Based Jacobi-Davidson Polynomial Eigensolver with Application in Quantum Dot Simulation

    Summary. The Jacobi-Davidson (JD) algorithm recently has gained popularity for finding a few selected interior eigenvalues of large sparse polynomial eigenvalue problems, which commonly appear in many computational science and engineering PDE based applications. As other inner-outer algorithms like Newton type method, the bottleneck of the JD algorithm is to solve approximately the inner correction equation. In the previous work, [Hwang, Wei, Huang, and Wang, A Parallel Additive Schwarz Preconditioned Jacobi-Davidson (ASPJD) Algorithm for Polynomial Eigenvalue Problems in Quantum Dot (QD) Simulation, Journal of Computational Physics (2010)], the authors proposed a parallel restricted additive Schwarz preconditioner in conjunction with a parallel Krylov subspace method to accelerate the convergence of the JD algorithm. Based on the previous computational experiences on the algorithmic parameter tuning for the ASPJD algorithm, we further investigate the parallel performance of a PETSc based ASPJD eigensolver on the Blue Gene/P, and a QD quintic eigenvalue problem is used as an example to demonstrate its scalability by showing the excellent strong scaling up to 2,048 cores

    A polynomial Jacobi-Davidson solver with support for non-monomial bases and deflation

    [EN] Large-scale polynomial eigenvalue problems can be solved by Krylov methods operating on an equivalent linear eigenproblem (linearization) of size d center dot n where d is the polynomial degree and n is the problem size, or by projection methods that keep the computation in the n-dimensional space. Jacobi-Davidson belongs to the latter class of methods, and, since it is a preconditioned eigensolver, it may be competitive in cases where explicitly computing a matrix factorization is exceedingly expensive. However, a fully fledged implementation of polynomial Jacobi-Davidson has to consider several issues, including deflation to compute more than one eigenpair, use of non-monomial bases for the case of large degree polynomials, and handling of complex eigenvalues when computing in real arithmetic.     Parallel Krylov Solvers for the Polynomial Eigenvalue Problem in SLEPc

    Polynomial eigenvalue problems are often found in scientific computing applications. When the coefficient matrices of the polynomial are large and sparse, usually only a few eigenpairs are required and projection methods are the best choice. We focus on Krylov methods that operate on the companion linearization of the polynomial but exploit the block structure with the aim of being memory-efficient in the representation of the Krylov subspace basis. The problem may appear in the form of a low-degree polynomial (quartic or quintic, say) expressed in the monomial basis, or a high-degree polynomial (coming from interpolation of a nonlinear eigenproblem) expressed in a nonmonomial basis. We have implemented a parallel solver in SLEPc covering both cases that is able to compute exterior as well as interior eigenvalues via spectral transformation. We discuss important issues such as scaling and restart and illustrate the robustness and performance of the solver with some numerical experiments.

    A parallel implementation of Davidson methods for large-scale eigenvalue problems in SLEPc

    In the context of large-scale eigenvalue problems, methods of Davidson type such as Jacobi-Davidson can be competitive with respect to other types of algorithms, especially in some particularly difficult situations such as computing interior eigenvalues or when matrix factorization is prohibitive or highly inefficient. However, these types of methods are not generally available in the form of high-quality parallel implementations, especially for the case of non-Hermitian eigenproblems. We present our implementation of various Davidson-type methods in SLEPc, the Scalable Library for Eigenvalue Problem Computations. The solvers incorporate many algorithmic variants for subspace expansion and extraction, and cover a wide range of eigenproblems including standard and generalized, Hermitian and non-Hermitian, with either real or complex arithmetic.     Software for Exascale Computing - SPPEXA 2016-2019

    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

    A Parallel Scalable PETSc-based Jacobi-Davidson Polynomial Eigensolver with Application in Quantum Dot Simulation

    The Jacobi-Davidson (JD) algorithm recently has gained its popularity for finding a few selected interior eigenvalues of large sparse polynomial eigenvalue problems, which commonly appear in many computational science and engineering PDE based applications. As other inner-outer algorithms like Newton type method, the bottleneck of the JD algorithm is to solve approximately the inner correction equation. In the previous work, [Hwang, Wei, Huang, and Wang, A Parallel Additive Schwarz Preconditioned Jacobi-Davidson (ASPJD) Algorithm for Polynomial Eigenvalue Problems in Quantum Dot (QD) Simulation, Journal of Computational Physics, (2010)], the authors proposed a parallel restricted additive Schwarz preconditioner in conjunction with a parallel Krylov subspace method to accelerate the convergence of the JD algorithm. Based on the previous computational experiences on the algorithmic parameter tuning for the ASPJD algorithm, we further investigate the parallel performance of a PETSc based ASPJD eigensolver on the Blue Gene/P, and a QD quintic eigenvalue problem is used as an example to demonstrate its scalability by showing the excellent strong scaling up to 2048 cores

    Higher-Order DGFEM Transport Calculations on Polytope Meshes for Massively-Parallel Architectures

    In this dissertation, we develop improvements to the discrete ordinates (S_N) neutron transport equation using a Discontinuous Galerkin Finite Element Method (DGFEM) spatial discretization on arbitrary polytope (polygonal and polyhedral) grids compatible for massively-parallel computer architectures. Polytope meshes are attractive for multiple reasons, including their use in other physics communities and their ease in handling local mesh refinement strategies. In this work, we focus on two topical areas of research. First, we discuss higher-order basis functions compatible to solve the DGFEM S_N transport equation on arbitrary polygonal meshes. Second, we assess Diffusion Synthetic Acceleration (DSA) schemes compatible with polytope grids for massively-parallel transport problems. We first utilize basis functions compatible with arbitrary polygonal grids for the DGFEM transport equation. We analyze four different basis functions that have linear completeness on polygons: the Wachspress rational functions, the PWL functions, the mean value coordinates, and the maximum entropy coordinates. We then describe the procedure to extend these polygonal linear basis functions into the quadratic serendipity space of functions. These quadratic basis functions can exactly interpolate monomial functions up to order 2. Both the linear and quadratic sets of basis functions preserve transport solutions in the thick diffusion limit. Maximum convergence rates of 2 and 3 are observed for regular transport solutions for the linear and quadratic basis functions, respectively. For problems that are limited by the regularity of the transport solution, convergence rates of 3/2 (when the solution is continuous) and 1/2 (when the solution is discontinuous) are observed. Spatial Adaptive Mesh Refinement (AMR) achieved superior convergence rates than uniform refinement, even for problems bounded by the solution regularity. We demonstrated accuracy in the AMR solutions by allowing them to reach a level where the ray effects of the angular discretization are realized. Next, we analyzed DSA schemes to accelerate both the within-group iterations as well as the thermal upscattering iterations for multigroup transport problems. Accelerating the thermal upscattering iterations is important for materials (e.g., graphite) with significant thermal energy scattering and minimal absorption. All of the acceleration schemes analyzed use a DGFEM discretization of the diffusion equation that is compatible with arbitrary polytope meshes: the Modified Interior Penalty Method (MIP). MIP uses the same DGFEM discretization as the transport equation. The MIP form is Symmetric Positive De_nite (SPD) and e_ciently solved with Preconditioned Conjugate Gradient (PCG) with Algebraic MultiGrid (AMG) preconditioning. The analysis from previous work was extended to show MIP's stability and robustness for accelerating 3D transport problems. MIP DSA preconditioning was implemented in the Parallel Deterministic Transport (PDT) code at Texas A&M University and linked with the HYPRE suite of linear solvers. Good scalability was numerically verified out to around 131K processors. The fraction of time spent performing DSA operations was small for problems with sufficient work performed in the transport sweep (O(10^3) angular directions). Finally, we have developed a novel methodology to accelerate transport problems dominated by thermal neutron upscattering. Compared to historical upscatter acceleration methods, our method is parallelizable and amenable to massively parallel transport calculations. Speedup factors of about 3-4 were observed with our new method