Search CORE

57 research outputs found

Accurate and Transferable Molecular-Orbital-Based Machine Learning for Molecular Modeling

Author: Cheng Lixue
Publication venue
Publication date: 01/01/2022
Field of study

Quantum simulation is a powerful tool for chemists to understand the chemical processes and discover their nature accurately by expensive wavefunction theory or approximately by cheap density function theory (DFT)\nomenclature{DFT}{Density Functional Theory}. However, the cost-accuracy trade-offs in electronic structure methods limit the application of quantum simulation to large chemical and biological systems. In this thesis, an accurate, transferable, and physical-driven molecular modelling framework, i.e., molecular-orbital-based machine learning (MOB-ML), is introduced to provide accurate wavefunction-quality molecular descriptions with at most mean-field level computational cost. Instead of directly predicting the total molecular energies, MOB-ML describes the post-Hartree-Fock correlation energy from molecular orbital information at the cost of Hartree-Fock computations. Preserving all the physical constraints, molecular orbital based (MOB) features represent the chemical space faithfully in both supervised clustering and unsupervised learning for chemical space explorations. The development of local regressions with scalable exact Gaussian processes within clusters further allows MOB-ML to provide the most accurate approach in both low and big data regimes. As exciting and general new tool to tackle various problems in chemistry, MOB-ML offers great accuracies of predicting total energies and serves as a universal density functional for organic molecules and non-covalent interactions in various chemical systems. With the availability of analytical nuclear gradients, MOB-ML is also capable of generating accurate PESs with few reference high-level electronic structure computations in the diffusion Monte Carlo accurately and efficiently for computational spectroscopy.</p

Caltech Theses and Dissertations

Towards chemical accuracy with shallow quantum circuits: A Clifford-based Hamiltonian engineering approach

Author: Cheng Lixue
Li Weitang
Sun Jiace
Publication venue
Publication date: 22/06/2023
Field of study

Achieving chemical accuracy with shallow quantum circuits is a significant challenge in quantum computational chemistry, particularly for near-term quantum devices. In this work, we present a Clifford-based Hamiltonian engineering algorithm, namely CHEM, that addresses the trade-off between circuit depth and accuracy. Based on variational quantum eigensolver and hardware-efficient ansatz, our method designs Clifford-based Hamiltonian transformation that (1) ensures a set of initial circuit parameters corresponding to the Hartree--Fock energy can be generated, (2) effectively maximizes the initial energy gradient with respect to circuit parameters, and (3) imposes negligible overhead for classical processing and does not require additional quantum resources. We demonstrate the efficacy of our approach using a quantum hardware emulator, achieving chemical accuracy for systems as large as 12 qubits with fewer than 30 two-qubit gates. Our Clifford-based Hamiltonian engineering approach offers a promising avenue for practical quantum computational chemistry on near-term quantum devices.Comment: 12 pages, 5 figures. SI is include

arXiv.org e-Print Archive

Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space

Author: Cheng Lixue
Miller III Thomas F.
Sun Jiace
Publication venue
Publication date: 20/04/2022
Field of study

We introduce an unsupervised clustering algorithm to improve training efficiency and accuracy in predicting energies using molecular-orbital-based machine learning (MOB-ML). This work determines clusters via the Gaussian mixture model (GMM) in an entirely automatic manner and simplifies an earlier supervised clustering approach [J. Chem. Theory Comput., 15, 6668 (2019)] by eliminating both the necessity for user-specified parameters and the training of an additional classifier. Unsupervised clustering results from GMM have the advantage of accurately reproducing chemically intuitive groupings of frontier molecular orbitals and having improved performance with an increasing number of training examples. The resulting clusters from supervised or unsupervised clustering is further combined with scalable Gaussian process regression (GPR) or linear regression (LR) to learn molecular energies accurately by generating a local regression model in each cluster. Among all four combinations of regressors and clustering methods, GMM combined with scalable exact Gaussian process regression (GMM/GPR) is the most efficient training protocol for MOB-ML. The numerical tests of molecular energy learning on thermalized datasets of drug-like molecules demonstrate the improved accuracy, transferability, and learning efficiency of GMM/GPR over not only other training protocols for MOB-ML, i.e., supervised regression-clustering combined with GPR(RC/GPR) and GPR without clustering. GMM/GPR also provide the best molecular energy predictions compared with the ones from literature on the same benchmark datasets. With a lower scaling, GMM/GPR has a 10.4-fold speedup in wall-clock training time compared with scalable exact GPR with a training size of 6500 QM7b-T molecules.Comment: 28 pages, 7 figure

arXiv.org e-Print Archive

Transferability in Machine Learning for Electronic Structure via the Molecular Orbital Basis

Author: Cheng Lixue
Miller III Thomas F.
Welborn Matthew
Publication venue: 'American Chemical Society (ACS)'
Publication date: 26/07/2018
Field of study

We present a machine learning (ML) method for predicting electronic structure correlation energies using Hartree-Fock input.The total correlation energy is expressed in terms of individual and pair contributions from occupied molecular orbitals, and Gaussian process regression is used to predict these contributions from a feature set that is based on molecular orbital properties, such as Fock, Coulomb, and exchange matrix elements. With the aim of maximizing transferability across chemical systems and compactness of the feature set, we avoid the usual specification of ML features in terms of atom- or geometry-specific information, such atom/element-types, bond-types, or local molecular structure. ML predictions of MP2 and CCSD energies are presented for a range of systems, demonstrating that the method maintains accuracy while providing transferability both within and across chemical families; this includes predictions for molecules with atom-types and elements that are not included in the training set. The method holds promise both in its current form and as a proof-of-principle for the use of ML in the design of generalized density-matrix functionals.Comment: 8 pages, 5 figure

arXiv.org e-Print Archive

Caltech Authors

FigShare

Error-mitigated Quantum Approximate Optimization via Learning-based Adaptive Optimization

Author: Chen Yu-Qin
Cheng Lixue
Zhang Shengyu
Zhang Shi-Xin
Publication venue
Publication date: 26/03/2023
Field of study

Combinatorial optimization problems are ubiquitous and computationally hard to solve in general. Quantum computing is envisioned as a powerful tool offering potential computational advantages for solving some of these problems. Quantum approximate optimization algorithm (QAOA), one of the most representative quantum-classical hybrid algorithms, is designed to solve certain combinatorial optimization problems by transforming a discrete optimization problem into a classical optimization problem over a continuous circuit parameter domain. QAOA objective landscape over the parameter variables is notorious for pervasive local minima and barren plateaus, and its viability in training significantly relies on the efficacy of the classical optimization algorithm. To enhance the performance of QAOA, we design double adaptive-region Bayesian optimization (DARBO), an adaptive classical optimizer for QAOA. Our experimental results demonstrate that the algorithm greatly outperforms conventional gradient-based and gradient-free optimizers in terms of speed, accuracy, and stability. We also address the issues of measurement efficiency and the suppression of quantum noise by successfully conducting the full optimization loop on the superconducting quantum processor. This work helps to unlock the full power of QAOA and paves the way toward achieving quantum advantage in practical classical tasks.Comment: Main text: 11 pages, 4 figures, SI: 5 pages, 5 figure

arXiv.org e-Print Archive

Regression-clustering for Improved Accuracy and Training Cost with Molecular-Orbital-Based Machine Learning

Author: Cheng Lixue
Kovachki Nikola B.
Miller III Thomas F.
Welborn Matthew
Publication venue
Publication date: 23/10/2019
Field of study

Machine learning (ML) in the representation of molecular-orbital-based (MOB) features has been shown to be an accurate and transferable approach to the prediction of post-Hartree-Fock correlation energies. Previous applications of MOB-ML employed Gaussian Process Regression (GPR), which provides good prediction accuracy with small training sets; however, the cost of GPR training scales cubically with the amount of data and becomes a computational bottleneck for large training sets. In the current work, we address this problem by introducing a clustering/regression/classification implementation of MOB-ML. In a first step, regression clustering (RC) is used to partition the training data to best fit an ensemble of linear regression (LR) models; in a second step, each cluster is regressed independently, using either LR or GPR; and in a third step, a random forest classifier (RFC) is trained for the prediction of cluster assignments based on MOB feature values. Upon inspection, RC is found to recapitulate chemically intuitive groupings of the frontier molecular orbitals, and the combined RC/LR/RFC and RC/GPR/RFC implementations of MOB-ML are found to provide good prediction accuracy with greatly reduced wall-clock training times. For a dataset of thermalized geometries of 7211 organic molecules of up to seven heavy atoms, both implementations reach chemical accuracy (1 kcal/mol error) with only 300 training molecules, while providing 35000-fold and 4500-fold reductions in the wall-clock training time, respectively, compared to MOB-ML without clustering. The resulting models are also demonstrated to retain transferability for the prediction of large-molecule energies with only small-molecule training data. Finally, it is shown that capping the number of training datapoints per cluster leads to further improvements in prediction accuracy with negligible increases in wall-clock training time.Comment: 31 pages, 10 figures, with an S

arXiv.org e-Print Archive

A universal density matrix functional from molecular orbital-based machine learning: Transferability across organic molecules

Author: Cheng Lixue
Christensen Anders S.
Miller Thomas F., III
Welborn Matthew
Publication venue: 'AIP Publishing'
Publication date: 07/04/2019
Field of study

We address the degree to which machine learning (ML) can be used to accurately and transferably predict post-Hartree-Fock correlation energies. Refined strategies for feature design and selection are presented, and the molecular-orbital-based machine learning (MOB-ML) method is applied to several test systems. Strikingly, for the second-order Møller-Plessett perturbation theory, coupled cluster with singles and doubles (CCSD), and CCSD with perturbative triples levels of theory, it is shown that the thermally accessible (350 K) potential energy surface for a single water molecule can be described to within 1 mhartree using a model that is trained from only a single reference calculation at a randomized geometry. To explore the breadth of chemical diversity that can be described, MOB-ML is also applied to a new dataset of thermalized (350 K) geometries of 7211 organic models with up to seven heavy atoms. In comparison with the previously reported Δ-ML method, MOB-ML is shown to reach chemical accuracy with threefold fewer training geometries. Finally, a transferability test in which models trained for seven-heavy-atom systems are used to predict energies for thirteen-heavy-atom systems reveals that MOB-ML reaches chemical accuracy with 36-fold fewer training calculations than Δ-ML (140 vs 5000 training calculations)

Caltech Authors

Transferability in Machine Learning for Electronic Structure via the Molecular Orbital Basis

Author: Cheng Lixue
Miller Thomas F., III
Welborn Matthew
Publication venue: 'American Chemical Society (ACS)'
Publication date: 11/09/2018
Field of study

We present a machine learning (ML) method for predicting electronic structure correlation energies using Hartree–Fock input. The total correlation energy is expressed in terms of individual and pair contributions from occupied molecular orbitals, and Gaussian process regression is used to predict these contributions from a feature set that is based on molecular orbital properties, such as Fock, Coulomb, and exchange matrix elements. With the aim of maximizing transferability across chemical systems and compactness of the feature set, we avoid the usual specification of ML features in terms of atom- or geometry-specific information, such atom/element-types, bond-types, or local molecular structure. ML predictions of MP2 and CCSD energies are presented for a range of systems, demonstrating that the method maintains accuracy while providing transferability both within and across chemical families; this includes predictions for molecules with atom-types and elements that are not included in the training set. The method holds promise both in its current form and as a proof-of-principle for the use of ML in the design of generalized density-matrix functionals