Search CORE

1,045 research outputs found

Distributed Memory, GPU Accelerated Fock Construction for Hybrid, Gaussian Basis Density Functional Theory

Author: Asadchev Andrey
Clark David
de Jong Wibe A.
Popovici Doru Thom
Valeev Edward F.
Waldrop Johnathan
Williams-Young David B.
Windus Theresa
Publication venue: 'AIP Publishing'
Publication date: 24/03/2023
Field of study

With the growing reliance of modern supercomputers on accelerator-based architectures such a GPUs, the development and optimization of electronic structure methods to exploit these massively parallel resources has become a recent priority. While significant strides have been made in the development of GPU accelerated, distributed memory algorithms for many-body (e.g. coupled-cluster) and spectral single-body (e.g. planewave, real-space and finite-element density functional theory [DFT]), the vast majority of GPU-accelerated Gaussian atomic orbital methods have focused on shared memory systems with only a handful of examples pursuing massive parallelism on distributed memory GPU architectures. In the present work, we present a set of distributed memory algorithms for the evaluation of the Coulomb and exact-exchange matrices for hybrid Kohn-Sham DFT with Gaussian basis sets via direct density-fitted (DF-J-Engine) and seminumerical (sn-K) methods, respectively. The absolute performance and strong scalability of the developed methods are demonstrated on systems ranging from a few hundred to over one thousand atoms using up to 128 NVIDIA A100 GPUs on the Perlmutter supercomputer.Comment: 45 pages, 9 figure

arXiv.org e-Print Archive

eScholarship - University of California

Complexity Reduction in Density Functional Theory: Locality in Space and Energy

Author: Dawson William
Genovese Luigi
Kamiya Muneaki
Kawashima Eisuke
Nakajima Takahito
Ratcliff Laura E.
Publication venue: 'AIP Publishing'
Publication date: 28/04/2023
Field of study

We present recent developments of the NTChem program for performing large scale hybrid Density Functional Theory calculations on the supercomputer Fugaku. We combine these developments with our recently proposed Complexity Reduction Framework to assess the impact of basis set and functional choice on its measures of fragment quality and interaction. We further exploit the all electron representation to study system fragmentation in various energy envelopes. Building off this analysis, we propose two algorithms for computing the orbital energies of the Kohn-Sham Hamiltonian. We demonstrate these algorithms can efficiently be applied to systems composed of thousands of atoms and as an analysis tool that reveals the origin of spectral properties.Comment: Accepted Manuscrip

arXiv.org e-Print Archive

Explore Bristol Research

X10 for high-performance scientific computing

Author: Milthorpe Joshua John
Publication venue
Publication date: 01/01/2015
Field of study

High performance computing is a key technology that enables large-scale physical simulation in modern science. While great advances have been made in methods and algorithms for scientific computing, the most commonly used programming models encourage a fragmented view of computation that maps poorly to the underlying computer architecture. Scientific applications typically manifest physical locality, which means that interactions between entities or events that are nearby in space or time are stronger than more distant interactions. Linear-scaling methods exploit physical locality by approximating distant interactions, to reduce computational complexity so that cost is proportional to system size. In these methods, the computation required for each portion of the system is different depending on that portion’s contribution to the overall result. To support productive development, application programmers need programming models that cleanly map aspects of the physical system being simulated to the underlying computer architecture while also supporting the irregular workloads that arise from the fragmentation of a physical system. X10 is a new programming language for high-performance computing that uses the asynchronous partitioned global address space (APGAS) model, which combines explicit representation of locality with asynchronous task parallelism. This thesis argues that the X10 language is well suited to expressing the algorithmic properties of locality and irregular parallelism that are common to many methods for physical simulation. The work reported in this thesis was part of a co-design effort involving researchers at IBM and ANU in which two significant computational chemistry codes were developed in X10, with an aim to improve the expressiveness and performance of the language. The first is a Hartree–Fock electronic structure code, implemented using the novel Resolution of the Coulomb Operator approach. The second evaluates electrostatic interactions between point charges, using either the smooth particle mesh Ewald method or the fast multipole method, with the latter used to simulate ion interactions in a Fourier Transform Ion Cyclotron Resonance mass spectrometer. We compare the performance of both X10 applications to state-of-the-art software packages written in other languages. This thesis presents improvements to the X10 language and runtime libraries for managing and visualizing the data locality of parallel tasks, communication using active messages, and efficient implementation of distributed arrays. We evaluate these improvements in the context of computational chemistry application examples. This work demonstrates that X10 can achieve performance comparable to established programming languages when running on a single core. More importantly, X10 programs can achieve high parallel efficiency on a multithreaded architecture, given a divide-and-conquer pattern parallel tasks and appropriate use of worker-local data. For distributed memory architectures, X10 supports the use of active messages to construct local, asynchronous communication patterns which outperform global, synchronous patterns. Although point-to-point active messages may be implemented efficiently, productive application development also requires collective communications; more work is required to integrate both forms of communication in the X10 language. The exploitation of locality is the key insight in both linear-scaling methods and the APGAS programming model; their combination represents an attractive opportunity for future co-design efforts

The Australian National University

On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters

Author: de Jong Wibe A.
van Dam Hubertus J. J.
Williams-Young David B.
Yang Chao
Publication venue
Publication date: 01/01/2020
Field of study

The predominance of Kohn-Sham density functional theory (KS-DFT) for the theoretical treatment of large experimentally relevant systems in molecular chemistry and materials science relies primarily on the existence of efficient software implementations which are capable of leveraging the latest advances in modern high performance computing (HPC). With recent trends in HPC leading towards in increasing reliance on heterogeneous accelerator based architectures such as graphics processing units (GPU), existing code bases must embrace these architectural advances to maintain the high-levels of performance which have come to be expected for these methods. In this work, we purpose a three-level parallelism scheme for the distributed numerical integration of the exchange-correlation (XC) potential in the Gaussian basis set discretization of the Kohn-Sham equations on large computing clusters consisting of multiple GPUs per compute node. In addition, we purpose and demonstrate the efficacy of the use of batched kernels, including batched level-3 BLAS operations, in achieving high-levels of performance on the GPU. We demonstrate the performance and scalability of the implementation of the purposed method in the NWChemEx software package by comparing to the existing scalable CPU XC integration in NWChem.Comment: 26 pages, 9 figure

arXiv.org e-Print Archive

eScholarship - University of California

Coupled cluster theory on modern heterogeneous supercomputers

Author: Abdulrahman Y. Zamani
Andreas Erbs Hillers-Bendtsen
Ashleigh Barnes
Dmytro Bykov
Filip Pawłowski
Hector H. Corzo
Jeppe Olsen
Kurt V. Mikkelsen
Poul Jørgensen
Publication venue: Frontiers Media S.A.
Publication date: 01/01/2023
Field of study

This study examines the computational challenges in elucidating intricate chemical systems, particularly through ab-initio methodologies. This work highlights the Divide-Expand-Consolidate (DEC) approach for coupled cluster (CC) theory—a linear-scaling, massively parallel framework—as a viable solution. Detailed scrutiny of the DEC framework reveals its extensive applicability for large chemical systems, yet it also acknowledges inherent limitations. To mitigate these constraints, the cluster perturbation theory is presented as an effective remedy. Attention is then directed towards the CPS (D-3) model, explicitly derived from a CC singles parent and a doubles auxiliary excitation space, for computing excitation energies. The reviewed new algorithms for the CPS (D-3) method efficiently capitalize on multiple nodes and graphical processing units, expediting heavy tensor contractions. As a result, CPS (D-3) emerges as a scalable, rapid, and precise solution for computing molecular properties in large molecular systems, marking it an efficient contender to conventional CC models

Directory of Open Access Journals

eScholarship - University of California

Recommended from our members

Towards an Accurate Description of Strongly Correlated Chemical Systems with Phaseless Auxiliary-Field Quantum Monte Carlo - Methodological Advances and Applications

Author: Shee James
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

The exact and phaseless variants of auxiliary-field quantum Monte Carlo (AFQMC) have been shown to be capable of producing accurate ground-state energies for a wide variety of systems including those which exhibit substantial electron correlation effects. The first chapter of this thesis will provide an overview of the relevant electronic structure problem, and the phaseless AFQMC (ph-AFQMC) methodology. The computational cost of performing these calculations has to date been relatively high, impeding many important applications of these approaches. In Chapter 2 we present a correlated sampling methodology for AFQMC which relies on error cancellation to dramatically accelerate the calculation of energy differences of relevance to chemical transformations. In particular, we show that our correlated sampling-based ph-AFQMC approach is capable of calculating redox properties, deprotonation free energies, and hydrogen abstraction energies in an efficient manner without sacrificing accuracy. We validate the computational protocol by calculating the ionization potentials and electron affinities of the atoms contained in the G2 test set and then proceed to utilize a composite method, which treats fixed-geometry processes with correlated sampling-based AFQMC and relaxation energies via MP2, to compute the ionization potential, deprotonation free energy, and the O-H bond dissociation energy of methanol, all to within chemical accuracy. We show that the efficiency of correlated sampling relative to uncorrelated calculations increases with system and basis set size and that correlated sampling greatly reduces the required number of random walkers to achieve a target statistical error. This translates to reductions in wall-times by factors of 55, 25, and 24 for the ionization potential of the K atom, the deprotonation of methanol, and hydrogen abstraction from the O-H bond of methanol, respectively. In Chapter 3 we present an implementation of ph-AFQMC utilizing graphical processing units (GPUs). The AFQMC method is recast in terms of matrix operations which are spread across thousands of processing cores and are executed in batches using custom Compute Unified Device Architecture kernels and the hardware-optimized cuBLAS matrix library. Algorithmic advances include a batched Sherman-Morrison-Woodbury algorithm to quickly update matrix determinants and inverses, density-fitting of the two-electron integrals, an energy algorithm involving a high-dimensional precomputed tensor, and the use of single-precision floating point arithmetic. These strategies result in dramatic reductions in wall-times for both single- and multi-determinant trial wavefunctions. For typical calculations we find speed-ups of roughly two orders of magnitude using just a single GPU card. Furthermore, we achieve near-unity parallel efficiency using 8 GPU cards on a single node, and can reach moderate system sizes via a local memory-slicing approach. We illustrate the robustness of our implementation on hydrogen chains of increasing length, and through the calculation of all-electron ionization potentials of the first-row transition metal atoms. We compare long imaginary-time calculations utilizing a population control algorithm with our previously published correlated sampling approach, and show that the latter improves not only the efficiency but also the accuracy of the computed ionization potentials. Taken together, the GPU implementation combined with correlated sampling provides a compelling computational method that will broaden the application of ph-AFQMC to the description of realistic correlated electronic systems. In Chapter 4 the bond dissociation energies of a set of 44 3d transition metal-containing diatomics are computed with ph-AFQMC utilizing the correlated sampling technique. We investigate molecules with H, N, O, F, Cl, and S ligands, including those in the 3dMLBE20 database first compiled by Truhlar and co-workers with calculated and experimental values that have since been revised by various groups. In order to make a direct comparison of the accuracy of our ph-AFQMC calculations with previously published results from 10 DFT functionals, CCSD(T), and icMR-CCSD(T), we establish an objective selection protocol which utilizes the most recent experimental results except for a few cases with well-specified discrepancies. With the remaining set of 41 molecules, we find that ph-AFQMC gives robust agreement with experiment superior to that of all other methods, with a mean absolute error (MAE) of 1.4(4) kcal/mol and maximum error of 3(3) kcal/mol (parenthesis account for reported experimental uncertainties and the statistical errors of our ph-AFQMC calculations). In comparison, CCSD(T) and B97, the best performing DFT functional considered here, have MAEs of 2.8 and 3.7 kcal/mol, respectively, and maximum errors in excess of 17 kcal/mol (for the CoS diatomic). While a larger and more diverse data set would be required to demonstrate that ph-AFQMC is truly a benchmark method for transition metal systems, our results indicate that the method has tremendous potential, exhibiting unprecedented consistency and accuracy compared to other approximate quantum chemical approaches. The energy gap between the lowest-lying singlet and triplet states is an important quantity in chemical photocatalysis, with relevant applications ranging from triplet fusion in optical upconversion to the design of organic light-emitting devices. The ab initio prediction of singlet-triplet (ST) gaps is challenging due to the potentially biradical nature of the involved states, combined with the potentially large size of relevant molecules. In Chapter 5, we show that ph-AFQMC can accurately predict ST gaps for chemical systems with singlet states of highly biradical nature, including a set of 13 small molecules and the ortho-, meta-, and para- isomers of benzyne. With respect to gas-phase experiments, ph-AFQMC using CASSCF trial wavefunctions achieves a mean averaged error of ~1 kcal/mol. Furthermore, we find that in the context of a spin-projection technique, ph-AFQMC using unrestricted single-determinant trial wavefunctions, which can be readily obtained for even very large systems, produces equivalently high accuracy. We proceed to show that this scalable methodology is capable of yielding accurate ST gaps for all linear polyacenes for which experimental measurements exist, i.e. naphthalene, anthracene, tetracene, and pentacene. Our results suggest a protocol for selecting either unrestricted Hartree-Fock or Kohn-Sham orbitals for the single-determinant trial wavefunction, based on the extent of spin-contamination. These findings provide a reliable computational tool with which to investigate specific photochemical processes involving large molecules that may have substantial biradical character. We compute the ST gaps for a set of anthracene derivatives which are potential triplet-triplet annihilators for optical upconversion, and compare our ph-AFQMC predictions with those from DFT and CCSD(T) methods. We conclude with a discussion of ongoing projects, further methodological improvements on the horizon, and future applications of ph-AFQMC to chemical systems of interest in the fields of biology, drug-discovery, catalysis, and condensed matter physics

Columbia University Academic Commons

T-cell epitope prediction and immune complex simulation using molecular dynamics: state of the art and persisting challenges

Author: Coveney Peter V
Davies Matthew N
Flower Darren R
Macdonald Isabel K
Phadwal Kanchan
Wan Shunzhou
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Atomistic Molecular Dynamics provides powerful and flexible tools for the prediction and analysis of molecular and macromolecular systems. Specifically, it provides a means by which we can measure theoretically that which cannot be measured experimentally: the dynamic time-evolution of complex systems comprising atoms and molecules. It is particularly suitable for the simulation and analysis of the otherwise inaccessible details of MHC-peptide interaction and, on a larger scale, the simulation of the immune synapse. Progress has been relatively tentative yet the emergence of truly high-performance computing and the development of coarse-grained simulation now offers us the hope of accurately predicting thermodynamic parameters and of simulating not merely a handful of proteins but larger, longer simulations comprising thousands of protein molecules and the cellular scale structures they form. We exemplify this within the context of immunoinformatics

Crossref

Springer - Publisher Connector

PubMed Central

Aston Publications Explorer

Roadmap on electronic structure codes in the exascale era

Author: Baroni S.
Blum V.
Bowler D. R.
Buccheri A.
Chelikowsky J. R.
Das S.
Dawson W.
Delugas P.
Dogan M.
Draxl C.
Galli G.
Gavini V.
Genovese L.
Giannozzi P.
Giantomassi M.
Gonze X.
Govoni M.
Gulans A.
Gygi F.
Herbert J. M.
Kokott S.
Kuhne T. D.
Liou K. -H.
Miyazaki T.
Motamarri P.
Nakata A.
Pask J. E.
Perez D.
Plessl C.
Ratcliff L. E.
Richard R. M.
Rossi M.
Schade R.
Scheffler M.
Schutt O.
Suryanarayana P.
Torrent M.
Truflandier L.
Windus T. L.
Xu Q.
Yu V. W.
Publication venue
Publication date: 01/01/2023
Field of study

Electronic structure calculations have been instrumental in providing many important insights into a range of physical and chemical properties of various molecular and solid-state systems. Their importance to various fields, including materials science, chemical sciences, computational chemistry, and device physics, is underscored by the large fraction of available public supercomputing resources devoted to these calculations. As we enter the exascale era, exciting new opportunities to increase simulation numbers, sizes, and accuracies present themselves. In order to realize these promises, the community of electronic structure software developers will however first have to tackle a number of challenges pertaining to the efficient use of new architectures that will rely heavily on massive parallelism and hardware accelerators. This roadmap provides a broad overview of the state-of-the-art in electronic structure calculations and of the various new directions being pursued by the community. It covers 14 electronic structure codes, presenting their current status, their development priorities over the next five years, and their plans towards tackling the challenges and leveraging the opportunities presented by the advent of exascale computing

Archivio istituzionale della ricerca - Università degli Studi di Udine

Recommended from our members

Quantum Chemistry in Nanoscale Environments: Insights on Surface-Enhanced Raman Scattering and Organic Photovoltaics

Author: Olivares-Amaya Roberto
Publication venue: 'Harvard University Botany Libraries'
Publication date: 18/12/2012
Field of study

The understanding of molecular effects in nanoscale environments is becoming increasingly relevant for various emerging fields. These include spectroscopy for molecular identification as well as in finding molecules for energy harvesting. Theoretical quantum chemistry has been increasingly useful to address these phenomena to yield an understanding of these effects. In the first part of this dissertation, we study the chemical effect of surface-enhanced Raman scattering (SERS). We use quantum chemistry simulations to study the metal-molecule interactions present in these systems. We find that the excitations that provide a chemical enhancement contain a mixed contribution from the metal and the molecule. Moreover, using atomistic studies we propose an additional source of enhancement, where a transition metal dopant surface could provide an additional enhancement. We also develop methods to study the electrostatic effects of molecules in metallic environments. We study the importance of image-charge effects, as well as field-bias to molecules interacting with perfect conductors. The atomistic modeling and the electrostatic approximation enable us to study the effects of the metal interacting with the molecule in a complementary fashion, which provides a better understanding of the complex effects present in SERS. In the second part of this dissertation, we present the Harvard Clean Energy project, a high-throughput approach for a large-scale computational screening and design of organic photovoltaic materials. We create molecular libraries to search for candidates structures and use quantum chemistry, machine learning and cheminformatics methods to characterize these systems and find structure-property relations. The scale of this study requires an equally large computational resource. We rely on distributed volunteer computing to obtain these properties. In the third part of this dissertation we present our work related to the acceleration of electronic structure methods using graphics processing units. This hardware represents a change of paradigm with respect to the typical CPU device architectures. We accelerate the resolution-of-the-identity Moller-Plesset second-order perturbation theory algorithm using graphics cards. We also provide detailed tools to address memory and single-precision issues that these cards often present

Harvard University - DASH