4 research outputs found

    Enhancing the scalability and load balancing of the parallel selected inversion algorithm via tree-based asynchronous communication

    Full text link
    We develop a method for improving the parallel scalability of the recently developed parallel selected inversion algorithm [Jacquelin, Lin and Yang 2014], named PSelInv, on massively parallel distributed memory machines. In the PSelInv method, we compute selected elements of the inverse of a sparse matrix A that can be decomposed as A = LU, where L is lower triangular and U is upper triangular. Updating these selected elements of A-1 requires restricted collective communications among a subset of processors within each column or row communication group created by a block cyclic distribution of L and U. We describe how this type of restricted collective communication can be implemented by using asynchronous point-to-point MPI communication functions combined with a binary tree based data propagation scheme. Because multiple restricted collective communications may take place at the same time in the parallel selected inversion algorithm, we need to use a heuristic to prevent processors participating in multiple collective communications from receiving too many messages. This heuristic allows us to reduce communication load imbalance and improve the overall scalability of the selected inversion algorithm. For instance, when 6,400 processors are used, we observe over 5x speedup for test matrices. It also mitigates the performance variability introduced by an inhomogeneous network topology

    A Left-Looking Selected Inversion Algorithm and Task Parallelism on Shared Memory Systems

    Full text link
    Given a sparse matrix AA, the selected inversion algorithm is an efficient method for computing certain selected elements of A−1A^{-1}. These selected elements correspond to all or some nonzero elements of the LU factors of AA. In many ways, the type of matrix updates performed in the selected inversion algorithm is similar to that performed in the LU factorization, although the sequence of operation is different. In the context of LU factorization, it is known that the left-looking and right-looking algorithms exhibit different memory access and data communication patterns, and hence different behavior on shared memory and distributed memory parallel machines. Corresponding to right-looking and left-looking LU factorization, selected inversion algorithm can be organized as a left-looking and a right-looking algorithm. The parallel right-looking version of the algorithm has been developed in [1]. The sequence of operations performed in this version of the selected inversion algorithm is similar to those performed in a left-looking LU factorization algorithm. In this paper, we describe the left-looking variant of the selected inversion algorithm, and based on task parallel method, present an efficient implementation of the algorithm for shared memory machines. We demonstrate that with the task scheduling features provided by OpenMP 4.0, the left-looking selected inversion algorithm can scale well both on the Intel Haswell multicore architecture and on the Intel Knights Corner (KNC) manycore architecture. Compared to the right-looking selected inversion algorithm, the left-looking formulation facilitates pipelining of work along different branches of the elimination tree, and can be a promising candidate for future development of massively parallel selected inversion algorithms on heterogeneous architecture.Comment: 9 pages, 7 figures, submitted to SuperComputing 201

    Robust Determination of the Chemical Potential in the Pole Expansion and Selected Inversion Method for Solving Kohn-Sham density functional theory

    Full text link
    Fermi operator expansion (FOE) methods are powerful alternatives to diagonalization type methods for solving Kohn-Sham density functional theory (KSDFT). One example is the pole expansion and selected inversion (PEXSI) method, which approximates the Fermi operator by rational matrix functions and reduces the computational complexity to at most quadratic scaling for solving KSDFT. Unlike diagonalization type methods, the chemical potential often cannot be directly read off from the result of a single step of evaluation of the Fermi operator. Hence multiple evaluations are needed to be sequentially performed to compute the chemical potential to ensure the correct number of electrons within a given tolerance. This hinders the performance of FOE methods in practice. In this paper we develop an efficient and robust strategy to determine the chemical potential in the context of the PEXSI method. The main idea of the new method is not to find the exact chemical potential at each self-consistent-field (SCF) iteration iteration, but to dynamically and rigorously update the upper and lower bounds for the true chemical potential, so that the chemical potential reaches its convergence along the SCF iteration. Instead of evaluating the Fermi operator for multiple times sequentially, our method uses a two-level strategy that evaluates the Fermi operators in parallel. In the regime of full parallelization, the wall clock time of each SCF iteration is always close to the time for one single evaluation of the Fermi operator, even when the initial guess is far away from the converged solution. We demonstrate the effectiveness of the new method using examples with metallic and insulating characters, as well as results from ab initio molecular dynamics.Comment: Submitted to Journal of Chemical Physic

    PEXSI-Σ\Sigma: A Green's function embedding method for Kohn-Sham density functional theory

    Full text link
    In this paper, we propose a new Green's function embedding method called PEXSI-Σ\Sigma for describing complex systems within the Kohn-Sham density functional theory (KSDFT) framework, after revisiting the physics literature of Green's function embedding methods from a numerical linear algebra perspective. The PEXSI-Σ\Sigma method approximates the density matrix using a set of nearly optimally chosen Green's functions evaluated at complex frequencies. For each Green's function, the complex boundary conditions are described by a self energy matrix Σ\Sigma constructed from a physical reference Green's function, which can be computed relatively easily. In the linear regime, such treatment of the boundary condition can be numerically exact. The support of the Σ\Sigma matrix is restricted to degrees of freedom near the boundary of computational domain, and can be interpreted as a frequency dependent surface potential. This makes it possible to perform KSDFT calculations with O(N2)\mathcal{O}(N^2) computational complexity, where NN is the number of atoms within the computational domain. Green's function embedding methods are also naturally compatible with atomistic Green's function methods for relaxing the atomic configuration outside the computational domain. As a proof of concept, we demonstrate the accuracy of the PEXSI-Σ\Sigma method for graphene with divacancy and dislocation dipole type of defects using the DFTB+ software package
    corecore