133,169 research outputs found

    Two-level Chebyshev filter based complementary subspace method: pushing the envelope of large-scale electronic structure calculations

    Get PDF
    We describe a novel iterative strategy for Kohn-Sham density functional theory calculations aimed at large systems (> 1000 electrons), applicable to metals and insulators alike. In lieu of explicit diagonalization of the Kohn-Sham Hamiltonian on every self-consistent field (SCF) iteration, we employ a two-level Chebyshev polynomial filter based complementary subspace strategy to: 1) compute a set of vectors that span the occupied subspace of the Hamiltonian; 2) reduce subspace diagonalization to just partially occupied states; and 3) obtain those states in an efficient, scalable manner via an inner Chebyshev-filter iteration. By reducing the necessary computation to just partially occupied states, and obtaining these through an inner Chebyshev iteration, our approach reduces the cost of large metallic calculations significantly, while eliminating subspace diagonalization for insulating systems altogether. We describe the implementation of the method within the framework of the Discontinuous Galerkin (DG) electronic structure method and show that this results in a computational scheme that can effectively tackle bulk and nano systems containing tens of thousands of electrons, with chemical accuracy, within a few minutes or less of wall clock time per SCF iteration on large-scale computing platforms. We anticipate that our method will be instrumental in pushing the envelope of large-scale ab initio molecular dynamics. As a demonstration of this, we simulate a bulk silicon system containing 8,000 atoms at finite temperature, and obtain an average SCF step wall time of 51 seconds on 34,560 processors; thus allowing us to carry out 1.0 ps of ab initio molecular dynamics in approximately 28 hours (of wall time).Comment: Resubmitted version (version 2

    Chebyshev polynomial filtered subspace iteration in the Discontinuous Galerkin method for large-scale electronic structure calculations

    Full text link
    The Discontinuous Galerkin (DG) electronic structure method employs an adaptive local basis (ALB) set to solve the Kohn-Sham equations of density functional theory (DFT) in a discontinuous Galerkin framework. The adaptive local basis is generated on-the-fly to capture the local material physics, and can systematically attain chemical accuracy with only a few tens of degrees of freedom per atom. A central issue for large-scale calculations, however, is the computation of the electron density (and subsequently, ground state properties) from the discretized Hamiltonian in an efficient and scalable manner. We show in this work how Chebyshev polynomial filtered subspace iteration (CheFSI) can be used to address this issue and push the envelope in large-scale materials simulations in a discontinuous Galerkin framework. We describe how the subspace filtering steps can be performed in an efficient and scalable manner using a two-dimensional parallelization scheme, thanks to the orthogonality of the DG basis set and block-sparse structure of the DG Hamiltonian matrix. The on-the-fly nature of the ALBs requires additional care in carrying out the subspace iterations. We demonstrate the parallel scalability of the DG-CheFSI approach in calculations of large-scale two-dimensional graphene sheets and bulk three-dimensional lithium-ion electrolyte systems. Employing 55,296 computational cores, the time per self-consistent field iteration for a sample of the bulk 3D electrolyte containing 8,586 atoms is 90 seconds, and the time for a graphene sheet containing 11,520 atoms is 75 seconds.Comment: Submitted to The Journal of Chemical Physic

    Stability and collapse of localized solutions of the controlled three-dimensional Gross-Pitaevskii equation

    Full text link
    On the basis of recent investigations, a newly developed analytical procedure is used for constructing a wide class of localized solutions of the controlled three-dimensional (3D) Gross-Pitaevskii equation (GPE) that governs the dynamics of Bose-Einstein condensates (BECs). The controlled 3D GPE is decomposed into a two-dimensional (2D) linear Schr\"{o}dinger equation and a one-dimensional (1D) nonlinear Schr\"{o}dinger equation, constrained by a variational condition for the controlling potential. Then, the above class of localized solutions are constructed as the product of the solutions of the transverse and longitudinal equations. On the basis of these exact 3D analytical solutions, a stability analysis is carried out, focusing our attention on the physical conditions for having collapsing or non-collapsing solutions.Comment: 21 pages, 14 figure

    Compact extra dimensions in cosmologies with f(T) structure

    Full text link
    The presence of compact extra dimensions in cosmological scenarios in the context of f(T)-like gravities is discussed. For the case of toroidal compactifications, the analysis is performed in an arbitrary number of extra dimensions. Spherical topologies for the extra dimensions are then carefully studied in six and seven spacetime dimensions, where the proper vielbein fields responsible for the parallelization process are found.Comment: 11 pages, one figure (added). Typos corrected, manuscript improved. Additional material is contained in section IV. Accepted for publication in Physical Review

    ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

    Full text link
    Solving the electronic structure from a generalized or standard eigenproblem is often the bottleneck in large scale calculations based on Kohn-Sham density-functional theory. This problem must be addressed by essentially all current electronic structure codes, based on similar matrix expressions, and by high-performance computation. We here present a unified software interface, ELSI, to access different strategies that address the Kohn-Sham eigenvalue problem. Currently supported algorithms include the dense generalized eigensolver library ELPA, the orbital minimization method implemented in libOMM, and the pole expansion and selected inversion (PEXSI) approach with lower computational complexity for semilocal density functionals. The ELSI interface aims to simplify the implementation and optimal use of the different strategies, by offering (a) a unified software framework designed for the electronic structure solvers in Kohn-Sham density-functional theory; (b) reasonable default parameters for a chosen solver; (c) automatic conversion between input and internal working matrix formats, and in the future (d) recommendation of the optimal solver depending on the specific problem. Comparative benchmarks are shown for system sizes up to 11,520 atoms (172,800 basis functions) on distributed memory supercomputing architectures.Comment: 55 pages, 14 figures, 2 table

    A Massively Parallel Algorithm for the Approximate Calculation of Inverse p-th Roots of Large Sparse Matrices

    Get PDF
    We present the submatrix method, a highly parallelizable method for the approximate calculation of inverse p-th roots of large sparse symmetric matrices which are required in different scientific applications. We follow the idea of Approximate Computing, allowing imprecision in the final result in order to be able to utilize the sparsity of the input matrix and to allow massively parallel execution. For an n x n matrix, the proposed algorithm allows to distribute the calculations over n nodes with only little communication overhead. The approximate result matrix exhibits the same sparsity pattern as the input matrix, allowing for efficient reuse of allocated data structures. We evaluate the algorithm with respect to the error that it introduces into calculated results, as well as its performance and scalability. We demonstrate that the error is relatively limited for well-conditioned matrices and that results are still valuable for error-resilient applications like preconditioning even for ill-conditioned matrices. We discuss the execution time and scaling of the algorithm on a theoretical level and present a distributed implementation of the algorithm using MPI and OpenMP. We demonstrate the scalability of this implementation by running it on a high-performance compute cluster comprised of 1024 CPU cores, showing a speedup of 665x compared to single-threaded execution
    corecore