1,712 research outputs found

    Conformational Sampling and Calculation of Molecular Free Energy Using Superposition Approximations

    Get PDF
    The superposition approximations (SAs), first proposed in the distribution function theories of liquids, are a family of approximations to a multivariate probability distribution function (pdf) in terms of its lower order marginal pdfs. In this talk, we first present the relationship between various forms of SA, the measurement of correlation via mutual information, and approximations to the entropy of the full pdf via truncations of the Mutual Information Expansion. Next, based on the SAs, a novel framework to construct computationally tractable approximations to the N-dimensional Boltzmann conformational distribution of molecule in terms of its low order marginal pdfs is presented. The marginal pdfs are obtained as normalized histograms of internal coordinates of a set of Boltzmann distributed conformations obtained by molecular dynamics (MD) simulation. We evaluate the accuracy of these approximate distributions constructed from marginal pdfs of order L <=3 for small molecules (<= 52 atoms) by using a novel conformational sampling algorithm to sample from them and comparing the samples with the original MD conformations used to populate the pdfs. We find that the triplet (L=3) level approximation has high conformational overlap with the physical Boltzmann distribution, and significantly better than that for the singlet (L=1) or doublet (L=2) level approximations. The results shed light on the relative importance of correlations of different orders. The singlet (L=1) and doublet (L=2) level approximate distributions are then used to define reference systems with known free energies, and then to compute the physical free energy of molecules using the reference system approach. Free energies are computed for small peptides as test molecules, and it is found that the convergence of the free energy estimate using a doublet reference is dramatically faster than with the singlet reference, consistent with greater overlap of the doublet reference system with the physical system. Potential further developments and practical applications are discussed

    Assembling a toolkit for computational dissection of dense protein systems

    Get PDF
    The cellular interior is a dense environment. Understanding how such an environment impacts the properties of proteins and other macromolecules, as well as how weak, non-specific interactions drive processes such as protein droplet formation through liquid-liquid phase separation, is a major challenge in biological physics. The complexity of this environment often makes experimental studies extremely challenging, leaving an important niche to be filled by simulation studies. Simulations do, however, have their own set of challenges, and to use them to their full potential, a suitable set of computational tools must be developed. Such a toolset must include accurate yet computationally affordable force fields, computationally efficient simulation algorithms, and analysis tools that allow for the extraction of meaningful information from the simulation results.In this thesis, a number of tools for all three areas are developed and/or evaluated. We present an atom level, implicit solvent force field, as well as a coarse-grained continuous HP model which we use for droplet formation studies. We investigate sampling issues in field theory simulations with the complex Langevin equation. We use finite-size scaling analysis to analyse simulations of liquid-liquid phase separation, and Markov state modeling to analyse crowding simulations

    Multiscale Modeling of RNA Structures Using NMR Chemical Shifts

    Full text link
    Structure determination is an important step in understanding the mechanisms of functional non-coding ribonucleic acids (ncRNAs). Experimental observables in solution-state nuclear magnetic resonance (NMR) spectroscopy provide valuable information about the structural and dynamic properties of RNAs. In particular, NMR-derived chemical shifts are considered structural "fingerprints" of RNA conformational state(s). In my thesis, I have developed computational tools to model RNA structures (mainly secondary structures) using structural information extracted from NMR chemical shifts. Inspired by methods that incorporate chemical-mapping data into RNA secondary structure prediction, I have developed a framework, CS-Fold, for using assigned chemical shift data to conditionally guide secondary structure folding algorithms. First, I developed neural network classifiers, CS2BPS (Chemical Shift to Base Pairing Status), that take assigned chemical shifts as input and output the predicted base pairing status of individual residues in an RNA. Then I used the base pairing status predictions as folding restraints to guide RNA secondary structure prediction. Extensive testing indicates that from assigned NMR chemical shifts, we could accurately predict the secondary structures of RNAs and map distinct conformational states of a single RNA. Another way to utilize experimental data like NMR chemical shifts in structure modeling is probabilistic modeling, that is, using experimental data to recover native-like structure from a structural ensemble that contains a set of low energy structure models. I first developed a model, SS2CS (Secondary Structure to Chemical Shift), that takes secondary structure as input and predicts chemical shifts with high accuracies. Using Bayesian/maximum entropy (BME), I was able to reweight secondary structure models based on the agreement between the measured and reweighted ensemble-averaged chemical shifts. Results indicate that BME could identify the native or near-native structure from a set of low energy structure models as well as recover some of the non-canonical interactions in tertiary structures. We could also probe the conformational landscape by studying the weight pattern assigned by BME. Finally, I explored RNA structural annotation using assigned NMR chemical shifts. Using multitask learning, eleven structural properties were annotated by classifying individual residues in terms of each structural property. The results indicate that our method, CS-Annotate, could predict the structural properties with reasonable accuracy. We believe that CS-Annotate could be used for assessing the quality of a structure model by comparing the structure derived structural properties with the CS-Annotate derived structural properties. One major limitation of the tools developed is that they require assigned chemical shifts. And to assign chemical shifts, a secondary structure model is typically assumed. However, with the recent advances in singly labeled RNA synthesis, chemical shifts could be assigned without the assumption about the secondary structure. We envision that using the chemical shifts derived from singly labeled NMR experiments, CS-Fold could be used for modeling the secondary structure of RNA. We also believe that unassigned chemical shifts could be used for selecting structure models. Native-like structures could be recovered by comparing optimally assigned chemical shifts with computed chemical shifts (generated by SS2CS). Overall, the results presented in this thesis indicate we could extract crucial structural information of the residues in an RNA based on its NMR chemical shifts. Moreover, with the tools like CS-Fold, SS2CS, and CS-Annotate, we could accurately predict the secondary structure, model conformational landscape, and study structural properties of an RNA.PHDChemistryUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163247/1/kexin_1.pd

    Graph Neural Networks for Molecules

    Full text link
    Graph neural networks (GNNs), which are capable of learning representations from graphical data, are naturally suitable for modeling molecular systems. This review introduces GNNs and their various applications for small organic molecules. GNNs rely on message-passing operations, a generic yet powerful framework, to update node features iteratively. Many researches design GNN architectures to effectively learn topological information of 2D molecule graphs as well as geometric information of 3D molecular systems. GNNs have been implemented in a wide variety of molecular applications, including molecular property prediction, molecular scoring and docking, molecular optimization and de novo generation, molecular dynamics simulation, etc. Besides, the review also summarizes the recent development of self-supervised learning for molecules with GNNs.Comment: A chapter for the book "Machine Learning in Molecular Sciences". 31 pages, 4 figure
    • …
    corecore