22 research outputs found

    ChatGPT Chemistry Assistant for Text Mining and Prediction of MOF Synthesis

    Full text link
    We use prompt engineering to guide ChatGPT in the automation of text mining of metal-organic frameworks (MOFs) synthesis conditions from diverse formats and styles of the scientific literature. This effectively mitigates ChatGPT's tendency to hallucinate information -- an issue that previously made the use of Large Language Models (LLMs) in scientific fields challenging. Our approach involves the development of a workflow implementing three different processes for text mining, programmed by ChatGPT itself. All of them enable parsing, searching, filtering, classification, summarization, and data unification with different tradeoffs between labor, speed, and accuracy. We deploy this system to extract 26,257 distinct synthesis parameters pertaining to approximately 800 MOFs sourced from peer-reviewed research articles. This process incorporates our ChemPrompt Engineering strategy to instruct ChatGPT in text mining, resulting in impressive precision, recall, and F1 scores of 90-99%. Furthermore, with the dataset built by text mining, we constructed a machine-learning model with over 86% accuracy in predicting MOF experimental crystallization outcomes and preliminarily identifying important factors in MOF crystallization. We also developed a reliable data-grounded MOF chatbot to answer questions on chemical reactions and synthesis procedures. Given that the process of using ChatGPT reliably mines and tabulates diverse MOF synthesis information in a unified format, while using only narrative language requiring no coding expertise, we anticipate that our ChatGPT Chemistry Assistant will be very useful across various other chemistry sub-disciplines.Comment: Published on Journal of the American Chemical Society (2023); 102 pages (18-page manuscript, 84 pages of supporting information

    Learning to Evolve Structural Ensembles of Unfolded and Disordered Proteins Using Experimental Solution Data

    Full text link
    We have developed a Generative Recurrent Neural Networks (GRNN) that learns the probability of the next residue torsions $X_{i+1}=\ [\phi_{i+1},\psi_{i+1},\omega _{i+1}, \chi_{i+1}]fromthepreviousresidueinthesequence from the previous residue in the sequence X_i$ to generate new IDP conformations. In addition, we couple the GRNN with a Bayesian model, X-EISD, in a reinforcement learning step that biases the probability distributions of torsions to take advantage of experimental data types such as J-couplingss, NOEs and PREs. We show that updating the generative model parameters according to the reward feedback on the basis of the agreement between structures and data improves upon existing approaches that simply reweight static structural pools for disordered proteins. Instead the GRNN "DynamICE" model learns to physically change the conformations of the underlying pool to those that better agree with experiment

    Supplemental Archive for LDRS

    No full text
    <p>Supplemental archive for the manuscript: Local Disordered Region Sampling (LDRS) for Ensemble Modeling of Proteins with Experimentally Undetermined or Low Confidence Prediction Segments.</p&gt

    A benchmark dataset for Hydrogen Combustion.

    No full text
    The generation of reference data for deep learning models is challenging for reactive systems, and more so for combustion reactions due to the extreme conditions that create radical species and alternative spin states during the combustion process. Here, we extend intrinsic reaction coordinate (IRC) calculations with ab initio MD simulations and normal mode displacement calculations to more extensively cover the potential energy surface for 19 reaction channels for hydrogen combustion. A total of ∼290,000 potential energies and ∼1,270,000 nuclear force vectors are evaluated with a high quality range-separated hybrid density functional, ωB97X-V, to construct the reference data set, including transition state ensembles, for the deep learning models to study hydrogen combustion reaction
    corecore