133,169 research outputs found
Two-level Chebyshev filter based complementary subspace method: pushing the envelope of large-scale electronic structure calculations
We describe a novel iterative strategy for Kohn-Sham density functional
theory calculations aimed at large systems (> 1000 electrons), applicable to
metals and insulators alike. In lieu of explicit diagonalization of the
Kohn-Sham Hamiltonian on every self-consistent field (SCF) iteration, we employ
a two-level Chebyshev polynomial filter based complementary subspace strategy
to: 1) compute a set of vectors that span the occupied subspace of the
Hamiltonian; 2) reduce subspace diagonalization to just partially occupied
states; and 3) obtain those states in an efficient, scalable manner via an
inner Chebyshev-filter iteration. By reducing the necessary computation to just
partially occupied states, and obtaining these through an inner Chebyshev
iteration, our approach reduces the cost of large metallic calculations
significantly, while eliminating subspace diagonalization for insulating
systems altogether. We describe the implementation of the method within the
framework of the Discontinuous Galerkin (DG) electronic structure method and
show that this results in a computational scheme that can effectively tackle
bulk and nano systems containing tens of thousands of electrons, with chemical
accuracy, within a few minutes or less of wall clock time per SCF iteration on
large-scale computing platforms. We anticipate that our method will be
instrumental in pushing the envelope of large-scale ab initio molecular
dynamics. As a demonstration of this, we simulate a bulk silicon system
containing 8,000 atoms at finite temperature, and obtain an average SCF step
wall time of 51 seconds on 34,560 processors; thus allowing us to carry out 1.0
ps of ab initio molecular dynamics in approximately 28 hours (of wall time).Comment: Resubmitted version (version 2
Chebyshev polynomial filtered subspace iteration in the Discontinuous Galerkin method for large-scale electronic structure calculations
The Discontinuous Galerkin (DG) electronic structure method employs an
adaptive local basis (ALB) set to solve the Kohn-Sham equations of density
functional theory (DFT) in a discontinuous Galerkin framework. The adaptive
local basis is generated on-the-fly to capture the local material physics, and
can systematically attain chemical accuracy with only a few tens of degrees of
freedom per atom. A central issue for large-scale calculations, however, is the
computation of the electron density (and subsequently, ground state properties)
from the discretized Hamiltonian in an efficient and scalable manner. We show
in this work how Chebyshev polynomial filtered subspace iteration (CheFSI) can
be used to address this issue and push the envelope in large-scale materials
simulations in a discontinuous Galerkin framework. We describe how the subspace
filtering steps can be performed in an efficient and scalable manner using a
two-dimensional parallelization scheme, thanks to the orthogonality of the DG
basis set and block-sparse structure of the DG Hamiltonian matrix. The
on-the-fly nature of the ALBs requires additional care in carrying out the
subspace iterations. We demonstrate the parallel scalability of the DG-CheFSI
approach in calculations of large-scale two-dimensional graphene sheets and
bulk three-dimensional lithium-ion electrolyte systems. Employing 55,296
computational cores, the time per self-consistent field iteration for a sample
of the bulk 3D electrolyte containing 8,586 atoms is 90 seconds, and the time
for a graphene sheet containing 11,520 atoms is 75 seconds.Comment: Submitted to The Journal of Chemical Physic
Stability and collapse of localized solutions of the controlled three-dimensional Gross-Pitaevskii equation
On the basis of recent investigations, a newly developed analytical procedure
is used for constructing a wide class of localized solutions of the controlled
three-dimensional (3D) Gross-Pitaevskii equation (GPE) that governs the
dynamics of Bose-Einstein condensates (BECs). The controlled 3D GPE is
decomposed into a two-dimensional (2D) linear Schr\"{o}dinger equation and a
one-dimensional (1D) nonlinear Schr\"{o}dinger equation, constrained by a
variational condition for the controlling potential. Then, the above class of
localized solutions are constructed as the product of the solutions of the
transverse and longitudinal equations. On the basis of these exact 3D
analytical solutions, a stability analysis is carried out, focusing our
attention on the physical conditions for having collapsing or non-collapsing
solutions.Comment: 21 pages, 14 figure
Compact extra dimensions in cosmologies with f(T) structure
The presence of compact extra dimensions in cosmological scenarios in the
context of f(T)-like gravities is discussed. For the case of toroidal
compactifications, the analysis is performed in an arbitrary number of extra
dimensions. Spherical topologies for the extra dimensions are then carefully
studied in six and seven spacetime dimensions, where the proper vielbein fields
responsible for the parallelization process are found.Comment: 11 pages, one figure (added). Typos corrected, manuscript improved.
Additional material is contained in section IV. Accepted for publication in
Physical Review
ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers
Solving the electronic structure from a generalized or standard eigenproblem
is often the bottleneck in large scale calculations based on Kohn-Sham
density-functional theory. This problem must be addressed by essentially all
current electronic structure codes, based on similar matrix expressions, and by
high-performance computation. We here present a unified software interface,
ELSI, to access different strategies that address the Kohn-Sham eigenvalue
problem. Currently supported algorithms include the dense generalized
eigensolver library ELPA, the orbital minimization method implemented in
libOMM, and the pole expansion and selected inversion (PEXSI) approach with
lower computational complexity for semilocal density functionals. The ELSI
interface aims to simplify the implementation and optimal use of the different
strategies, by offering (a) a unified software framework designed for the
electronic structure solvers in Kohn-Sham density-functional theory; (b)
reasonable default parameters for a chosen solver; (c) automatic conversion
between input and internal working matrix formats, and in the future (d)
recommendation of the optimal solver depending on the specific problem.
Comparative benchmarks are shown for system sizes up to 11,520 atoms (172,800
basis functions) on distributed memory supercomputing architectures.Comment: 55 pages, 14 figures, 2 table
A Massively Parallel Algorithm for the Approximate Calculation of Inverse p-th Roots of Large Sparse Matrices
We present the submatrix method, a highly parallelizable method for the
approximate calculation of inverse p-th roots of large sparse symmetric
matrices which are required in different scientific applications. We follow the
idea of Approximate Computing, allowing imprecision in the final result in
order to be able to utilize the sparsity of the input matrix and to allow
massively parallel execution. For an n x n matrix, the proposed algorithm
allows to distribute the calculations over n nodes with only little
communication overhead. The approximate result matrix exhibits the same
sparsity pattern as the input matrix, allowing for efficient reuse of allocated
data structures.
We evaluate the algorithm with respect to the error that it introduces into
calculated results, as well as its performance and scalability. We demonstrate
that the error is relatively limited for well-conditioned matrices and that
results are still valuable for error-resilient applications like
preconditioning even for ill-conditioned matrices. We discuss the execution
time and scaling of the algorithm on a theoretical level and present a
distributed implementation of the algorithm using MPI and OpenMP. We
demonstrate the scalability of this implementation by running it on a
high-performance compute cluster comprised of 1024 CPU cores, showing a speedup
of 665x compared to single-threaded execution
- …