1,687 research outputs found
A Parallel Iterative Method for Computing Molecular Absorption Spectra
We describe a fast parallel iterative method for computing molecular
absorption spectra within TDDFT linear response and using the LCAO method. We
use a local basis of "dominant products" to parametrize the space of orbital
products that occur in the LCAO approach. In this basis, the dynamical
polarizability is computed iteratively within an appropriate Krylov subspace.
The iterative procedure uses a a matrix-free GMRES method to determine the
(interacting) density response. The resulting code is about one order of
magnitude faster than our previous full-matrix method. This acceleration makes
the speed of our TDDFT code comparable with codes based on Casida's equation.
The implementation of our method uses hybrid MPI and OpenMP parallelization in
which load balancing and memory access are optimized. To validate our approach
and to establish benchmarks, we compute spectra of large molecules on various
types of parallel machines.
The methods developed here are fairly general and we believe they will find
useful applications in molecular physics/chemistry, even for problems that are
beyond TDDFT, such as organic semiconductors, particularly in photovoltaics.Comment: 20 pages, 17 figures, 3 table
Quantum ESPRESSO: a modular and open-source software project for quantum simulations of materials
Quantum ESPRESSO is an integrated suite of computer codes for
electronic-structure calculations and materials modeling, based on
density-functional theory, plane waves, and pseudopotentials (norm-conserving,
ultrasoft, and projector-augmented wave). Quantum ESPRESSO stands for "opEn
Source Package for Research in Electronic Structure, Simulation, and
Optimization". It is freely available to researchers around the world under the
terms of the GNU General Public License. Quantum ESPRESSO builds upon
newly-restructured electronic-structure codes that have been developed and
tested by some of the original authors of novel electronic-structure algorithms
and applied in the last twenty years by some of the leading materials modeling
groups worldwide. Innovation and efficiency are still its main focus, with
special attention paid to massively-parallel architectures, and a great effort
being devoted to user friendliness. Quantum ESPRESSO is evolving towards a
distribution of independent and inter-operable codes in the spirit of an
open-source project, where researchers active in the field of
electronic-structure calculations are encouraged to participate in the project
by contributing their own codes or by implementing their own ideas into
existing codes.Comment: 36 pages, 5 figures, resubmitted to J.Phys.: Condens. Matte
Compilation techniques for irregular problems on parallel machines
Massively parallel computers have ushered in the era of teraflop computing. Even though large and powerful machines are being built, they are used by only a fraction of the computing community. The fundamental reason for this situation is that parallel machines are difficult to program. Development of compilers that automatically parallelize programs will greatly increase the use of these machines.;A large class of scientific problems can be categorized as irregular computations. In this class of computation, the data access patterns are known only at runtime, creating significant difficulties for a parallelizing compiler to generate efficient parallel codes. Some compilers with very limited abilities to parallelize simple irregular computations exist, but the methods used by these compilers fail for any non-trivial applications code.;This research presents development of compiler transformation techniques that can be used to effectively parallelize an important class of irregular programs. A central aim of these transformation techniques is to generate codes that aggressively prefetch data. Program slicing methods are used as a part of the code generation process. In this approach, a program written in a data-parallel language, such as HPF, is transformed so that it can be executed on a distributed memory machine. An efficient compiler runtime support system has been developed that performs data movement and software caching
First-principle molecular dynamics with ultrasoft pseudopotentials: parallel implementation and application to extended bio-inorganic system
We present a plane-wave ultrasoft pseudopotential implementation of
first-principle molecular dynamics, which is well suited to model large
molecular systems containing transition metal centers. We describe an efficient
strategy for parallelization that includes special features to deal with the
augmented charge in the contest of Vanderbilt's ultrasoft pseudopotentials. We
also discuss a simple approach to model molecular systems with a net charge
and/or large dipole/quadrupole moments. We present test applications to
manganese and iron porphyrins representative of a large class of biologically
relevant metallorganic systems. Our results show that accurate
Density-Functional Theory calculations on systems with several hundred atoms
are feasible with access to moderate computational resources.Comment: 29 pages, 4 Postscript figures, revtex
Run-time optimization of adaptive irregular applications
Compared to traditional compile-time optimization, run-time optimization could offer significant performance improvements when parallelizing and optimizing adaptive irregular applications, because it performs program analysis and adaptive optimizations during program execution. Run-time techniques can succeed where static techniques fail because they exploit the characteristics of input data, programs' dynamic behaviors, and the underneath execution environment. When optimizing adaptive irregular applications for parallel execution, a common observation is that the effectiveness of the optimizing transformations depends on programs' input data and their dynamic phases. This dissertation presents a set of run-time optimization techniques that match the characteristics of programs' dynamic memory access patterns and the appropriate optimization (parallelization) transformations. First, we present a general adaptive algorithm selection framework to automatically and adaptively select at run-time the best performing, functionally equivalent algorithm for each of its execution instances. The selection process is based on off-line automatically generated prediction models and characteristics (collected and analyzed dynamically) of the algorithm's input data, In this dissertation, we specialize this framework for automatic selection of reduction algorithms. In this research, we have identified a small set of machine independent high-level characterization parameters and then we deployed an off-line, systematic experiment process to generate prediction models. These models, in turn, match the parameters to the best optimization transformations for a given machine. The technique has been evaluated thoroughly in terms of applications, platforms, and programs' dynamic behaviors. Specifically, for the reduction algorithm selection, the selected performance is within 2% of optimal performance and on average is 60% better than "Replicated Buffer," the default parallel reduction algorithm specified by OpenMP standard. To reduce the overhead of speculative run-time parallelization, we have developed an adaptive run-time parallelization technique that dynamically chooses effcient shadow structures to record a program's dynamic memory access patterns for parallelization. This technique complements the original speculative run-time parallelization technique, the LRPD test, in parallelizing loops with sparse memory accesses. The techniques presented in this dissertation have been implemented in an optimizing research compiler and can be viewed as effective building blocks for comprehensive run-time optimization systems, e.g., feedback-directed optimization systems and dynamic compilation systems
A Sparse SCF algorithm and its parallel implementation: Application to DFTB
We present an algorithm and its parallel implementation for solving a self
consistent problem as encountered in Hartree Fock or Density Functional Theory.
The algorithm takes advantage of the sparsity of matrices through the use of
local molecular orbitals. The implementation allows to exploit efficiently
modern symmetric multiprocessing (SMP) computer architectures. As a first
application, the algorithm is used within the density functional based tight
binding method, for which most of the computational time is spent in the linear
algebra routines (diagonalization of the Fock/Kohn-Sham matrix). We show that
with this algorithm (i) single point calculations on very large systems
(millions of atoms) can be performed on large SMP machines (ii) calculations
involving intermediate size systems (1~000--100~000 atoms) are also strongly
accelerated and can run efficiently on standard servers (iii) the error on the
total energy due to the use of a cut-off in the molecular orbital coefficients
can be controlled such that it remains smaller than the SCF convergence
criterion.Comment: 13 pages, 11 figure
Chebyshev polynomial filtered subspace iteration in the Discontinuous Galerkin method for large-scale electronic structure calculations
The Discontinuous Galerkin (DG) electronic structure method employs an
adaptive local basis (ALB) set to solve the Kohn-Sham equations of density
functional theory (DFT) in a discontinuous Galerkin framework. The adaptive
local basis is generated on-the-fly to capture the local material physics, and
can systematically attain chemical accuracy with only a few tens of degrees of
freedom per atom. A central issue for large-scale calculations, however, is the
computation of the electron density (and subsequently, ground state properties)
from the discretized Hamiltonian in an efficient and scalable manner. We show
in this work how Chebyshev polynomial filtered subspace iteration (CheFSI) can
be used to address this issue and push the envelope in large-scale materials
simulations in a discontinuous Galerkin framework. We describe how the subspace
filtering steps can be performed in an efficient and scalable manner using a
two-dimensional parallelization scheme, thanks to the orthogonality of the DG
basis set and block-sparse structure of the DG Hamiltonian matrix. The
on-the-fly nature of the ALBs requires additional care in carrying out the
subspace iterations. We demonstrate the parallel scalability of the DG-CheFSI
approach in calculations of large-scale two-dimensional graphene sheets and
bulk three-dimensional lithium-ion electrolyte systems. Employing 55,296
computational cores, the time per self-consistent field iteration for a sample
of the bulk 3D electrolyte containing 8,586 atoms is 90 seconds, and the time
for a graphene sheet containing 11,520 atoms is 75 seconds.Comment: Submitted to The Journal of Chemical Physic
- …