57 research outputs found
Accurate and Transferable Molecular-Orbital-Based Machine Learning for Molecular Modeling
Quantum simulation is a powerful tool for chemists to understand the chemical processes and discover their nature accurately by expensive wavefunction theory or approximately by cheap density function theory (DFT)\nomenclature{DFT}{Density Functional Theory}. However, the cost-accuracy trade-offs in electronic structure methods limit the application of quantum simulation to large chemical and biological systems. In this thesis, an accurate, transferable, and physical-driven molecular modelling framework, i.e., molecular-orbital-based machine learning (MOB-ML), is introduced to provide accurate wavefunction-quality molecular descriptions with at most mean-field level computational cost. Instead of directly predicting the total molecular energies, MOB-ML describes the post-Hartree-Fock correlation energy from molecular orbital information at the cost of Hartree-Fock computations.
Preserving all the physical constraints, molecular orbital based (MOB) features represent the chemical space faithfully in both supervised clustering and unsupervised learning for chemical space explorations. The development of local regressions with scalable exact Gaussian processes within clusters further allows MOB-ML to provide the most accurate approach in both low and big data regimes. As exciting and general new tool to tackle various problems in chemistry, MOB-ML offers great accuracies of predicting total energies and serves as a universal density functional for organic molecules and non-covalent interactions in various chemical systems. With the availability of analytical nuclear gradients, MOB-ML is also capable of generating accurate PESs with few reference high-level electronic structure computations in the diffusion Monte Carlo accurately and efficiently for computational spectroscopy.</p
Towards chemical accuracy with shallow quantum circuits: A Clifford-based Hamiltonian engineering approach
Achieving chemical accuracy with shallow quantum circuits is a significant
challenge in quantum computational chemistry, particularly for near-term
quantum devices. In this work, we present a Clifford-based Hamiltonian
engineering algorithm, namely CHEM, that addresses the trade-off between
circuit depth and accuracy. Based on variational quantum eigensolver and
hardware-efficient ansatz, our method designs Clifford-based Hamiltonian
transformation that (1) ensures a set of initial circuit parameters
corresponding to the Hartree--Fock energy can be generated, (2) effectively
maximizes the initial energy gradient with respect to circuit parameters, and
(3) imposes negligible overhead for classical processing and does not require
additional quantum resources. We demonstrate the efficacy of our approach using
a quantum hardware emulator, achieving chemical accuracy for systems as large
as 12 qubits with fewer than 30 two-qubit gates. Our Clifford-based Hamiltonian
engineering approach offers a promising avenue for practical quantum
computational chemistry on near-term quantum devices.Comment: 12 pages, 5 figures. SI is include
Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space
We introduce an unsupervised clustering algorithm to improve training
efficiency and accuracy in predicting energies using molecular-orbital-based
machine learning (MOB-ML). This work determines clusters via the Gaussian
mixture model (GMM) in an entirely automatic manner and simplifies an earlier
supervised clustering approach [J. Chem. Theory Comput., 15, 6668 (2019)] by
eliminating both the necessity for user-specified parameters and the training
of an additional classifier. Unsupervised clustering results from GMM have the
advantage of accurately reproducing chemically intuitive groupings of frontier
molecular orbitals and having improved performance with an increasing number of
training examples. The resulting clusters from supervised or unsupervised
clustering is further combined with scalable Gaussian process regression (GPR)
or linear regression (LR) to learn molecular energies accurately by generating
a local regression model in each cluster. Among all four combinations of
regressors and clustering methods, GMM combined with scalable exact Gaussian
process regression (GMM/GPR) is the most efficient training protocol for
MOB-ML. The numerical tests of molecular energy learning on thermalized
datasets of drug-like molecules demonstrate the improved accuracy,
transferability, and learning efficiency of GMM/GPR over not only other
training protocols for MOB-ML, i.e., supervised regression-clustering combined
with GPR(RC/GPR) and GPR without clustering. GMM/GPR also provide the best
molecular energy predictions compared with the ones from literature on the same
benchmark datasets. With a lower scaling, GMM/GPR has a 10.4-fold speedup in
wall-clock training time compared with scalable exact GPR with a training size
of 6500 QM7b-T molecules.Comment: 28 pages, 7 figure
Transferability in Machine Learning for Electronic Structure via the Molecular Orbital Basis
We present a machine learning (ML) method for predicting electronic structure
correlation energies using Hartree-Fock input.The total correlation energy is
expressed in terms of individual and pair contributions from occupied molecular
orbitals, and Gaussian process regression is used to predict these
contributions from a feature set that is based on molecular orbital properties,
such as Fock, Coulomb, and exchange matrix elements. With the aim of maximizing
transferability across chemical systems and compactness of the feature set, we
avoid the usual specification of ML features in terms of atom- or
geometry-specific information, such atom/element-types, bond-types, or local
molecular structure. ML predictions of MP2 and CCSD energies are presented for
a range of systems, demonstrating that the method maintains accuracy while
providing transferability both within and across chemical families; this
includes predictions for molecules with atom-types and elements that are not
included in the training set. The method holds promise both in its current form
and as a proof-of-principle for the use of ML in the design of generalized
density-matrix functionals.Comment: 8 pages, 5 figure
Error-mitigated Quantum Approximate Optimization via Learning-based Adaptive Optimization
Combinatorial optimization problems are ubiquitous and computationally hard
to solve in general. Quantum computing is envisioned as a powerful tool
offering potential computational advantages for solving some of these problems.
Quantum approximate optimization algorithm (QAOA), one of the most
representative quantum-classical hybrid algorithms, is designed to solve
certain combinatorial optimization problems by transforming a discrete
optimization problem into a classical optimization problem over a continuous
circuit parameter domain. QAOA objective landscape over the parameter variables
is notorious for pervasive local minima and barren plateaus, and its viability
in training significantly relies on the efficacy of the classical optimization
algorithm. To enhance the performance of QAOA, we design double adaptive-region
Bayesian optimization (DARBO), an adaptive classical optimizer for QAOA. Our
experimental results demonstrate that the algorithm greatly outperforms
conventional gradient-based and gradient-free optimizers in terms of speed,
accuracy, and stability. We also address the issues of measurement efficiency
and the suppression of quantum noise by successfully conducting the full
optimization loop on the superconducting quantum processor. This work helps to
unlock the full power of QAOA and paves the way toward achieving quantum
advantage in practical classical tasks.Comment: Main text: 11 pages, 4 figures, SI: 5 pages, 5 figure
Regression-clustering for Improved Accuracy and Training Cost with Molecular-Orbital-Based Machine Learning
Machine learning (ML) in the representation of molecular-orbital-based (MOB)
features has been shown to be an accurate and transferable approach to the
prediction of post-Hartree-Fock correlation energies. Previous applications of
MOB-ML employed Gaussian Process Regression (GPR), which provides good
prediction accuracy with small training sets; however, the cost of GPR training
scales cubically with the amount of data and becomes a computational bottleneck
for large training sets. In the current work, we address this problem by
introducing a clustering/regression/classification implementation of MOB-ML. In
a first step, regression clustering (RC) is used to partition the training data
to best fit an ensemble of linear regression (LR) models; in a second step,
each cluster is regressed independently, using either LR or GPR; and in a third
step, a random forest classifier (RFC) is trained for the prediction of cluster
assignments based on MOB feature values. Upon inspection, RC is found to
recapitulate chemically intuitive groupings of the frontier molecular orbitals,
and the combined RC/LR/RFC and RC/GPR/RFC implementations of MOB-ML are found
to provide good prediction accuracy with greatly reduced wall-clock training
times. For a dataset of thermalized geometries of 7211 organic molecules of up
to seven heavy atoms, both implementations reach chemical accuracy (1 kcal/mol
error) with only 300 training molecules, while providing 35000-fold and
4500-fold reductions in the wall-clock training time, respectively, compared to
MOB-ML without clustering. The resulting models are also demonstrated to retain
transferability for the prediction of large-molecule energies with only
small-molecule training data. Finally, it is shown that capping the number of
training datapoints per cluster leads to further improvements in prediction
accuracy with negligible increases in wall-clock training time.Comment: 31 pages, 10 figures, with an S
A universal density matrix functional from molecular orbital-based machine learning: Transferability across organic molecules
We address the degree to which machine learning (ML) can be used to accurately and transferably predict post-Hartree-Fock correlation energies. Refined strategies for feature design and selection are presented, and the molecular-orbital-based machine learning (MOB-ML) method is applied to several test systems. Strikingly, for the second-order Møller-Plessett perturbation theory, coupled cluster with singles and doubles (CCSD), and CCSD with perturbative triples levels of theory, it is shown that the thermally accessible (350 K) potential energy surface for a single water molecule can be described to within 1 mhartree using a model that is trained from only a single reference calculation at a randomized geometry. To explore the breadth of chemical diversity that can be described, MOB-ML is also applied to a new dataset of thermalized (350 K) geometries of 7211 organic models with up to seven heavy atoms. In comparison with the previously reported Δ-ML method, MOB-ML is shown to reach chemical accuracy with threefold fewer training geometries. Finally, a transferability test in which models trained for seven-heavy-atom systems are used to predict energies for thirteen-heavy-atom systems reveals that MOB-ML reaches chemical accuracy with 36-fold fewer training calculations than Δ-ML (140 vs 5000 training calculations)
Transferability in Machine Learning for Electronic Structure via the Molecular Orbital Basis
We present a machine learning (ML) method for predicting electronic structure correlation energies using Hartree–Fock input. The total correlation energy is expressed in terms of individual and pair contributions from occupied molecular orbitals, and Gaussian process regression is used to predict these contributions from a feature set that is based on molecular orbital properties, such as Fock, Coulomb, and exchange matrix elements. With the aim of maximizing transferability across chemical systems and compactness of the feature set, we avoid the usual specification of ML features in terms of atom- or geometry-specific information, such atom/element-types, bond-types, or local molecular structure. ML predictions of MP2 and CCSD energies are presented for a range of systems, demonstrating that the method maintains accuracy while providing transferability both within and across chemical families; this includes predictions for molecules with atom-types and elements that are not included in the training set. The method holds promise both in its current form and as a proof-of-principle for the use of ML in the design of generalized density-matrix functionals
- …