1,045 research outputs found
Distributed Memory, GPU Accelerated Fock Construction for Hybrid, Gaussian Basis Density Functional Theory
With the growing reliance of modern supercomputers on accelerator-based
architectures such a GPUs, the development and optimization of electronic
structure methods to exploit these massively parallel resources has become a
recent priority. While significant strides have been made in the development of
GPU accelerated, distributed memory algorithms for many-body (e.g.
coupled-cluster) and spectral single-body (e.g. planewave, real-space and
finite-element density functional theory [DFT]), the vast majority of
GPU-accelerated Gaussian atomic orbital methods have focused on shared memory
systems with only a handful of examples pursuing massive parallelism on
distributed memory GPU architectures. In the present work, we present a set of
distributed memory algorithms for the evaluation of the Coulomb and
exact-exchange matrices for hybrid Kohn-Sham DFT with Gaussian basis sets via
direct density-fitted (DF-J-Engine) and seminumerical (sn-K) methods,
respectively. The absolute performance and strong scalability of the developed
methods are demonstrated on systems ranging from a few hundred to over one
thousand atoms using up to 128 NVIDIA A100 GPUs on the Perlmutter
supercomputer.Comment: 45 pages, 9 figure
Complexity Reduction in Density Functional Theory: Locality in Space and Energy
We present recent developments of the NTChem program for performing large
scale hybrid Density Functional Theory calculations on the supercomputer
Fugaku. We combine these developments with our recently proposed Complexity
Reduction Framework to assess the impact of basis set and functional choice on
its measures of fragment quality and interaction. We further exploit the all
electron representation to study system fragmentation in various energy
envelopes. Building off this analysis, we propose two algorithms for computing
the orbital energies of the Kohn-Sham Hamiltonian. We demonstrate these
algorithms can efficiently be applied to systems composed of thousands of atoms
and as an analysis tool that reveals the origin of spectral properties.Comment: Accepted Manuscrip
X10 for high-performance scientific computing
High performance computing is a key technology that enables large-scale physical
simulation in modern science. While great advances have been made in methods and
algorithms for scientific computing, the most commonly used programming models
encourage a fragmented view of computation that maps poorly to the underlying
computer architecture.
Scientific applications typically manifest physical locality, which means that interactions
between entities or events that are nearby in space or time are stronger
than more distant interactions. Linear-scaling methods exploit physical locality by approximating
distant interactions, to reduce computational complexity so that cost is
proportional to system size. In these methods, the computation required for each
portion of the system is different depending on that portion’s contribution to the
overall result. To support productive development, application programmers need
programming models that cleanly map aspects of the physical system being simulated
to the underlying computer architecture while also supporting the irregular
workloads that arise from the fragmentation of a physical system.
X10 is a new programming language for high-performance computing that uses
the asynchronous partitioned global address space (APGAS) model, which combines
explicit representation of locality with asynchronous task parallelism. This thesis
argues that the X10 language is well suited to expressing the algorithmic properties
of locality and irregular parallelism that are common to many methods for physical
simulation.
The work reported in this thesis was part of a co-design effort involving researchers
at IBM and ANU in which two significant computational chemistry codes
were developed in X10, with an aim to improve the expressiveness and performance
of the language. The first is a Hartree–Fock electronic structure code, implemented
using the novel Resolution of the Coulomb Operator approach. The second evaluates
electrostatic interactions between point charges, using either the smooth particle
mesh Ewald method or the fast multipole method, with the latter used to simulate
ion interactions in a Fourier Transform Ion Cyclotron Resonance mass spectrometer.
We compare the performance of both X10 applications to state-of-the-art software
packages written in other languages.
This thesis presents improvements to the X10 language and runtime libraries for
managing and visualizing the data locality of parallel tasks, communication using
active messages, and efficient implementation of distributed arrays. We evaluate these improvements in the context of computational chemistry application examples.
This work demonstrates that X10 can achieve performance comparable to established
programming languages when running on a single core. More importantly,
X10 programs can achieve high parallel efficiency on a multithreaded architecture,
given a divide-and-conquer pattern parallel tasks and appropriate use of worker-local
data. For distributed memory architectures, X10 supports the use of active messages
to construct local, asynchronous communication patterns which outperform global,
synchronous patterns. Although point-to-point active messages may be implemented
efficiently, productive application development also requires collective communications;
more work is required to integrate both forms of communication in the X10
language. The exploitation of locality is the key insight in both linear-scaling methods and
the APGAS programming model; their combination represents an attractive opportunity
for future co-design efforts
On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters
The predominance of Kohn-Sham density functional theory (KS-DFT) for the
theoretical treatment of large experimentally relevant systems in molecular
chemistry and materials science relies primarily on the existence of efficient
software implementations which are capable of leveraging the latest advances in
modern high performance computing (HPC). With recent trends in HPC leading
towards in increasing reliance on heterogeneous accelerator based architectures
such as graphics processing units (GPU), existing code bases must embrace these
architectural advances to maintain the high-levels of performance which have
come to be expected for these methods. In this work, we purpose a three-level
parallelism scheme for the distributed numerical integration of the
exchange-correlation (XC) potential in the Gaussian basis set discretization of
the Kohn-Sham equations on large computing clusters consisting of multiple GPUs
per compute node. In addition, we purpose and demonstrate the efficacy of the
use of batched kernels, including batched level-3 BLAS operations, in achieving
high-levels of performance on the GPU. We demonstrate the performance and
scalability of the implementation of the purposed method in the NWChemEx
software package by comparing to the existing scalable CPU XC integration in
NWChem.Comment: 26 pages, 9 figure
Coupled cluster theory on modern heterogeneous supercomputers
This study examines the computational challenges in elucidating intricate chemical systems, particularly through ab-initio methodologies. This work highlights the Divide-Expand-Consolidate (DEC) approach for coupled cluster (CC) theory—a linear-scaling, massively parallel framework—as a viable solution. Detailed scrutiny of the DEC framework reveals its extensive applicability for large chemical systems, yet it also acknowledges inherent limitations. To mitigate these constraints, the cluster perturbation theory is presented as an effective remedy. Attention is then directed towards the CPS (D-3) model, explicitly derived from a CC singles parent and a doubles auxiliary excitation space, for computing excitation energies. The reviewed new algorithms for the CPS (D-3) method efficiently capitalize on multiple nodes and graphical processing units, expediting heavy tensor contractions. As a result, CPS (D-3) emerges as a scalable, rapid, and precise solution for computing molecular properties in large molecular systems, marking it an efficient contender to conventional CC models
Recommended from our members
Towards an Accurate Description of Strongly Correlated Chemical Systems with Phaseless Auxiliary-Field Quantum Monte Carlo - Methodological Advances and Applications
The exact and phaseless variants of auxiliary-field quantum Monte Carlo (AFQMC) have been shown to be capable of producing accurate ground-state energies for a wide variety of systems including those which exhibit substantial electron correlation effects. The first chapter of this thesis will provide an overview of the relevant electronic structure problem, and the phaseless AFQMC (ph-AFQMC) methodology.
The computational cost of performing these calculations has to date been relatively high, impeding many important applications of these approaches. In Chapter 2 we present a correlated sampling methodology for AFQMC which relies on error cancellation to dramatically accelerate the calculation of energy differences of relevance to chemical transformations. In particular, we show that our correlated sampling-based ph-AFQMC approach is capable of calculating redox properties, deprotonation free energies, and hydrogen abstraction energies in an efficient manner without sacrificing accuracy. We validate the computational protocol by calculating the ionization potentials and electron affinities of the atoms contained in the G2 test set and then proceed to utilize a composite method, which treats fixed-geometry processes with correlated sampling-based AFQMC and relaxation energies via MP2, to compute the ionization potential, deprotonation free energy, and the O-H bond dissociation energy of methanol, all to within chemical accuracy. We show that the efficiency of correlated sampling relative to uncorrelated calculations increases with system and basis set size and that correlated sampling greatly reduces the required number of random walkers to achieve a target statistical error. This translates to reductions in wall-times by factors of 55, 25, and 24 for the ionization potential of the K atom, the deprotonation of methanol, and hydrogen abstraction from the O-H bond of methanol, respectively.
In Chapter 3 we present an implementation of ph-AFQMC utilizing graphical processing units (GPUs). The AFQMC method is recast in terms of matrix operations which are spread across thousands of processing cores and are executed in batches using custom Compute Unified Device Architecture kernels and the hardware-optimized cuBLAS matrix library. Algorithmic advances include a batched Sherman-Morrison-Woodbury algorithm to quickly update matrix determinants and inverses, density-fitting of the two-electron integrals, an energy algorithm involving a high-dimensional precomputed tensor, and the use of single-precision floating point arithmetic. These strategies result in dramatic reductions in wall-times for both single- and multi-determinant trial wavefunctions. For typical calculations we find speed-ups of roughly two orders of magnitude using just a single GPU card. Furthermore, we achieve near-unity parallel efficiency using 8 GPU cards on a single node, and can reach moderate system sizes via a local memory-slicing approach. We illustrate the robustness of our implementation on hydrogen chains of increasing length, and through the calculation of all-electron ionization potentials of the first-row transition metal atoms. We compare long imaginary-time calculations utilizing a population control algorithm with our previously published correlated sampling approach, and show that the latter improves not only the efficiency but also the accuracy of the computed ionization potentials. Taken together, the GPU implementation combined with correlated sampling provides a compelling computational method that will broaden the application of ph-AFQMC to the description of realistic correlated electronic systems.
In Chapter 4 the bond dissociation energies of a set of 44 3d transition metal-containing diatomics are computed with ph-AFQMC utilizing the correlated sampling technique. We investigate molecules with H, N, O, F, Cl, and S ligands, including those in the 3dMLBE20 database first compiled by Truhlar and co-workers with calculated and experimental values that have since been revised by various groups. In order to make a direct comparison of the accuracy of our ph-AFQMC calculations with previously published results from 10 DFT functionals, CCSD(T), and icMR-CCSD(T), we establish an objective selection protocol which utilizes the most recent experimental results except for a few cases with well-specified discrepancies. With the remaining set of 41 molecules, we find that ph-AFQMC gives robust agreement with experiment superior to that of all other methods, with a mean absolute error (MAE) of 1.4(4) kcal/mol and maximum error of 3(3) kcal/mol (parenthesis account for reported experimental uncertainties and the statistical errors of our ph-AFQMC calculations). In comparison, CCSD(T) and B97, the best performing DFT functional considered here, have MAEs of 2.8 and 3.7 kcal/mol, respectively, and maximum errors in excess of 17 kcal/mol (for the CoS diatomic). While a larger and more diverse data set would be required to demonstrate that ph-AFQMC is truly a benchmark method for transition metal systems, our results indicate that the method has tremendous potential, exhibiting unprecedented consistency and accuracy compared to other approximate quantum chemical approaches.
The energy gap between the lowest-lying singlet and triplet states is an important quantity in chemical photocatalysis, with relevant applications ranging from triplet fusion in optical upconversion to the design of organic light-emitting devices. The ab initio prediction of singlet-triplet (ST) gaps is challenging due to the potentially biradical nature of the involved states, combined with the potentially large size of relevant molecules. In Chapter 5, we show that ph-AFQMC can accurately predict ST gaps for chemical systems with singlet states of highly biradical nature, including a set of 13 small molecules and the ortho-, meta-, and para- isomers of benzyne. With respect to gas-phase experiments, ph-AFQMC using CASSCF trial wavefunctions achieves a mean averaged error of ~1 kcal/mol. Furthermore, we find that in the context of a spin-projection technique, ph-AFQMC using unrestricted single-determinant trial wavefunctions, which can be readily obtained for even very large systems, produces equivalently high accuracy. We proceed to show that this scalable methodology is capable of yielding accurate ST gaps for all linear polyacenes for which experimental measurements exist, i.e. naphthalene, anthracene, tetracene, and pentacene. Our results suggest a protocol for selecting either unrestricted Hartree-Fock or Kohn-Sham orbitals for the single-determinant trial wavefunction, based on the extent of spin-contamination. These findings provide a reliable computational tool with which to investigate specific photochemical processes involving large molecules that may have substantial biradical character. We compute the ST gaps for a set of anthracene derivatives which are potential triplet-triplet annihilators for optical upconversion, and compare our ph-AFQMC predictions with those from DFT and CCSD(T) methods.
We conclude with a discussion of ongoing projects, further methodological improvements on the horizon, and future applications of ph-AFQMC to chemical systems of interest in the fields of biology, drug-discovery, catalysis, and condensed matter physics
T-cell epitope prediction and immune complex simulation using molecular dynamics: state of the art and persisting challenges
Atomistic Molecular Dynamics provides powerful and flexible tools for the prediction and analysis of molecular and macromolecular systems. Specifically, it provides a means by which we can measure theoretically that which cannot be measured experimentally: the dynamic time-evolution of complex systems comprising atoms and molecules. It is particularly suitable for the simulation and analysis of the otherwise inaccessible details of MHC-peptide interaction and, on a larger scale, the simulation of the immune synapse. Progress has been relatively tentative yet the emergence of truly high-performance computing and the development of coarse-grained simulation now offers us the hope of accurately predicting thermodynamic parameters and of simulating not merely a handful of proteins but larger, longer simulations comprising thousands of protein molecules and the cellular scale structures they form. We exemplify this within the context of immunoinformatics
Roadmap on electronic structure codes in the exascale era
Electronic structure calculations have been instrumental in providing many important insights into a range of physical and chemical properties of various molecular and solid-state systems. Their importance to various fields, including materials science, chemical sciences, computational chemistry, and device physics, is underscored by the large fraction of available public supercomputing resources devoted to these calculations. As we enter the exascale era, exciting new opportunities to increase simulation numbers, sizes, and accuracies present themselves. In order to realize these promises, the community of electronic structure software developers will however first have to tackle a number of challenges pertaining to the efficient use of new architectures that will rely heavily on massive parallelism and hardware accelerators. This roadmap provides a broad overview of the state-of-the-art in electronic structure calculations and of the various new directions being pursued by the community. It covers 14 electronic structure codes, presenting their current status, their development priorities over the next five years, and their plans towards tackling the challenges and leveraging the opportunities presented by the advent of exascale computing
Recommended from our members
Quantum Chemistry in Nanoscale Environments: Insights on Surface-Enhanced Raman Scattering and Organic Photovoltaics
The understanding of molecular effects in nanoscale environments is becoming increasingly relevant for various emerging fields. These include spectroscopy for molecular identification as well as in finding molecules for energy harvesting. Theoretical quantum chemistry has been increasingly useful to address these phenomena to yield an understanding of these effects. In the first part of this dissertation, we study the chemical effect of surface-enhanced Raman scattering (SERS). We use quantum chemistry simulations to study the metal-molecule interactions present in these systems. We find that the excitations that provide a chemical enhancement contain a mixed contribution from the metal and the molecule. Moreover, using atomistic studies we propose an additional source of enhancement, where a transition metal dopant surface could provide an additional enhancement. We also develop methods to study the electrostatic effects of molecules in metallic environments. We study the importance of image-charge effects, as well as field-bias to molecules interacting with perfect conductors. The atomistic modeling and the electrostatic approximation enable us to study the effects of the metal interacting with the molecule in a complementary fashion, which provides a better understanding of the complex effects present in SERS. In the second part of this dissertation, we present the Harvard Clean Energy project, a high-throughput approach for a large-scale computational screening and design of organic photovoltaic materials. We create molecular libraries to search for candidates structures and use quantum chemistry, machine learning and cheminformatics methods to characterize these systems and find structure-property relations. The scale of this study requires an equally large computational resource. We rely on distributed volunteer computing to obtain these properties. In the third part of this dissertation we present our work related to the acceleration of electronic structure methods using graphics processing units. This hardware represents a change of paradigm with respect to the typical CPU device architectures. We accelerate the resolution-of-the-identity Moller-Plesset second-order perturbation theory algorithm using graphics cards. We also provide detailed tools to address memory and single-precision issues that these cards often present
- …