10 research outputs found

    Estimating Absolute Configurational Entropies of Macromolecules: The Minimally Coupled Subspace Approach

    Get PDF
    We develop a general minimally coupled subspace approach (MCSA) to compute absolute entropies of macromolecules, such as proteins, from computer generated canonical ensembles. Our approach overcomes limitations of current estimates such as the quasi-harmonic approximation which neglects non-linear and higher-order correlations as well as multi-minima characteristics of protein energy landscapes. Here, Full Correlation Analysis, adaptive kernel density estimation, and mutual information expansions are combined and high accuracy is demonstrated for a number of test systems ranging from alkanes to a 14 residue peptide. We further computed the configurational entropy for the full 67-residue cofactor of the TATA box binding protein illustrating that MCSA yields improved results also for large macromolecular systems

    The Dissolution of Cellulose in Ionic Liquids - A Molecular Dynamics Study

    Get PDF
    The use of ionic liquids for the dissolution of cellulose promises an alternative method for the thermochemical pretreatment of biomass that may be more efficient and environmentally acceptable than conventional techniques in aqueous solution. Understanding how ionic liquids act on cellulose is essential for improving pretreatment conditions and thus detailed knowledge of the interactions between solute and solvent molecules is necessary. Here, results from the first all-atom molecular dynamics simulation of an entire cellulose microfibril in 1-butyl-3-methylimidazolium chloride (BmimCl) are presented and the interactions and orientations of solvent ions with respect to glucose units on the hydrophobic and hydrophilic surfaces of the fiber are analyzed in detail, shedding light on the initiation stages of cellulose dissolution. Moreover, replica-exchange simulations of a single cellulose chain fully solvated in BmimCl and in water are performed for a total of around 13 μs in order to study the dynamics and thermodynamics of the end state of the dissolution. The results indicate that chloride anions predominantly interact with cellulose hydroxyl groups and disrupt the intrachain O3H’···O5 hydrogen bonds, which are essential for the integrity of cellulose fibers. The cations stack preferentially on the hydrophobic cellulose surface, governed by non-polar interactions with cellulose, which can stabilize detached cellulose chains by compensating the interaction between stacked layers. Moreover, a frequently occurring intercalation of cations on the hydrophilic surface is observed, which by separating cellulose layers can also potentially facilitate the initiation of fiber disintegration. The single-chain simulations indicate that differences in cellulose solvation mechanisms between the two solvents exist. Although global size-related properties of the cellulose chain are comparable in the two solvents, local conformational properties of cellulose differ significantly between the BmimCl and water solutions. In general, the results indicate that solute-solvent interaction energies are more favorable and that the cellulose chain is more flexible in BmimCl than in water. Taken together, the simulations explain how ionic liquids can facilitate cellulose dissolution: the synergistic action of anions and cations helps to initiate fiber deconstruction through specific interactions on the fiber surface and to solvate single cellulose chains through favorable solvent interactions and conformational flexibility

    Analysis of biological and chemical systems using information theoretic approximations

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Biological Engineering, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 115-123).The identification and quantification of high-dimensional relationships is a major challenge in the analysis of both biological and chemical systems. To address this challenge, a variety of experimental and computational tools have been developed to generate multivariate samples from these systems. Information theory provides a general framework for the analysis of such data, but for many applications, the large sample sizes needed to reliably compute high-dimensional information theoretic statistics are not available. In this thesis we develop, validate, and apply a novel framework for approximating high-dimensional information theoretic statistics using associated terms of arbitrarily low order. For a variety of synthetic, biological, and chemical systems, we find that these low-order approximations provide good estimates of higher-order multivariate relationships, while dramatically reducing the number of samples needed to reach convergence. We apply the framework to the analysis of multiple biological systems, including a phospho-proteomic data set in which we identify a subset of phospho-peptides that is maximally informative of cellular response (migration and proliferation) across multiple conditions (varying EGF or heregulin stimulation, and HER2 expression). This subset is shown to produce statistical models with superior performance to those built with subsets of similar size. We also employ the framework to extract configurational entropies from molecular dynamics simulations of a series of small molecules, demonstrating improved convergence relative to existing methods. As these disparate applications highlight, our framework enables the use of general information theoretic phrasings even in systems where data quantities preclude direct estimation of the high-order statistics. Furthermore, because the framework provides a hierarchy of approximations of increasing order, as data collection and analysis techniques improve, the method extends to generate more accurate results, while maintaining the same underlying theory.by Bracken Matheny King.Ph.D

    Thermodynamic driving forces in protein regulation studied by molecular dynamics simulations.

    No full text

    Machine Learning and Solvation Theory for Drug Discovery

    Full text link
    Drug discovery is a notoriously expensive and time-consuming process; hence, developing computational methods to facilitate the discovery process and lower the associated costs is a long-sought goal of computational chemists. Protein-ligand binding, which provides the physical and chemical basis for the mechanism of action of most drugs, occurs in an aqueous environment, and binding affinity is determined not only by atomic interactions between the protein and ligand but also by changes in their interactions with surrounding water molecules that occur upon binding. Thus, a quantitative understanding of the roles water molecules play in the protein-ligand binding process is an essential foundation for developing computational methods and tools to aid the drug discovery process. Grid inhomogeneous solvation theory (GIST) is a tool that measures the thermodynamic and structural properties of water molecules on protein surfaces. Since its implementation, GIST has been used to study water behavior upon protein-ligand binding and to account for solvent effects in scoring functions used in virtual screening. This thesis is comprised of two research projects that extend the applications and functionality of GIST. In the first project, we investigated whether the water properties measured by GIST could improve the performance of machine learning models, specifically, convolutional neural networks (CNN) applied to virtual screening (GIST-CNN project). In the second project, we implemented the particle mesh Ewald (PME) algorithm for energy calculation in GIST, enabling GIST to become a more accurate and more efficient tool for end-state free energy calculation (PME-GIST project). The GIST-CNN project arose in response to reports indicating that convolutional neural network (CNN) models were able to outperform classical scoring functions in virtual screening. We noticed that all the reported machine learning models had been trained only by protein-ligand structures, while water molecules were completely neglected. Given that water molecules play essential roles in protein-ligand binding, we hypothesized that we could further improve the performance of CNN models in terms of enrichment efficiency by adding water features, measured by GIST, to the data used to train the model. Contrary to our hypothesis, we found that adding water features could not further improve the performance of a CNN model trained by protein-ligand structures, which was already very high. However, further investigation revealed that the high performance and reported enrichment efficiency of a CNN model trained by protein-ligand information was solely attributable to biases in the Database of Useful Decoys-Enhanced (DUD-E), which was used to train and test the model. In this project, we also established a suite of methods to investigate what a model learns from the input during training and argued that machine learning models should be thoroughly validated before being applied in real drug discovery projects. The motivations for the PME-GIST project were twofold. First, although GIST provides the statistical thermodynamic framework for thermodynamic end-state free energy calculation, inconsistencies in energy calculations between the previous GIST implementation (GIST-2016) and modern molecular dynamics engines prevent precise comparison of the GIST end-state method to other reference free energy calculation methods such as thermodynamic integration (TI). Second, the O(N2) nonbonded energy calculation is the most expensive step in the entire GIST calculation process. By implementation of the PME algorithm into GIST, we aimed to achieve GIST energy calculations consistent with those of modern molecular dynamic engines and to accelerate the energy calculation to O(NlogN), which is highly desirable when applying GIST to the measurement of water properties across an entire protein surface. In addition to implementing PME, we derived a simple empirical estimator for high order entropies, which are truncated in GIST. After incorporating PME-based energy calculation and the high order entropy estimator, we used PME-GIST to calculate end-state solvation free energy for a wide range of small molecules and achieved results highly consistent with TI (= 0.99, mean unsigned difference = 0.44 kcal/mol). The PME-GIST code we developed in this project was integrated into the open-source molecular dynamics analysis software CPPTRAJ for easy access by others in the drug discovery community. In summary, in this thesis, we explored the potential of adding solvation thermodynamics to machine learning-based virtual screening and found that the high performance reported for machine learning models in this application reflected biases in the dataset used construct and test them rather than successfully generalization of the physical principles that govern molecular interactions. We also addressed the inconsistent energy calculation between GIST and modern molecular simulation engines by developing PME-GIST. We hope the research work presented in this thesis will further expand and accelerate the application of GIST to drug discovery

    Bibliography of communication and research products

    Get PDF
    "This publication is a compendium of NIOSH publications and reports produced during calendar year 2005. Citations are listed by category including: I. Journal Articles; II. Book Chapters; III. NIOSH Numbered Publications; IV. Abstracts/Proceedings; V. Control Technology Reports; VI. Fatality Assessment and Control Evaluation Reports; VII. Fire Fighter Fatality Investigation and Prevention Reports; and, VIII. Health Hazard Evaluation Reports. Author, keyword and National Occupational Research Agenda (NORA) priority area indexes are also included." - NIOSHTIC-2I. Journal articles -- II. Book chapters -- III. NIOSH numbered publications -- IV. Abstracts/ Proceedings -- V. Control technology reports -- VI. Fatality assessment and control evaluation reports -- VII. Fire fighter fatality investigation and prevention reports -- VIII. Health hazard evaluation reports -- IX. Author index-- X. Keyword index -- XI. National Occupational Research Agenda (NORA) index"May 2008."Also available via the World Wide Web
    corecore