1,417 research outputs found

    Exploration of Reaction Pathways and Chemical Transformation Networks

    Full text link
    For the investigation of chemical reaction networks, the identification of all relevant intermediates and elementary reactions is mandatory. Many algorithmic approaches exist that perform explorations efficiently and automatedly. These approaches differ in their application range, the level of completeness of the exploration, as well as the amount of heuristics and human intervention required. Here, we describe and compare the different approaches based on these criteria. Future directions leveraging the strengths of chemical heuristics, human interaction, and physical rigor are discussed.Comment: 48 pages, 4 figure

    Knowledge-based approaches for understanding structure-dynamics-function relationship in proteins

    Get PDF
    Proteins accomplish their functions through conformational changes, often brought about by changes in environmental conditions or ligand binding. Predicting the functional mechanisms of proteins is impossible without a deeper understanding of conformational transitions. Dynamics is the key link between the structure and function of proteins. The protein data bank (PDB) contains multiple structures of the same protein, which have been solved under different conditions, using different experimental methods or in complexes with different ligands. These alternate conformations of the same protein (or similar proteins) can provide important information about what conformational changes take place and how they are brought about. Though there have been multiple computational approaches developed to predict dynamics from structure information, little work has been done to exploit this apparent, but potentially informative, redundancy in the PDB. In this work I bridge this gap by exploring various knowledge-based approaches to understand the structure-dynamics relationship and how it translates into protein function. First, a novel method for constructing free energy landscapes for conformational changes in proteins is proposed by combining principal motions with knowledge-based potential energies and entropies from coarse-grained models of protein dynamics. Second, an innovative method for computing knowledge-based entropies for proteins using an inverse Boltzmann approach is introduced, similar to the manner in which statistical potentials were previously extracted. We hypothesize that amino acid contact changes observed in the course of conformational changes within a large set of proteins can provide information about local pairwise flexibilities or entropies. By combining this new entropy measure with knowledge-based potential functions, we formulate a knowledge-based free energy (KBF) function that we demonstrate outperforms other statistical potentials in its ability to identify native protein structures embedded with sets of decoys. Third, I apply the methods developed above in collaboration with experimentalists to understand the molecular mechanisms of conformational changes in several protein systems including cadherins and membrane transporters. This work introduces several ways that the huge data in the PDB can be utilized to understand the underlying principles behind the structure-dynamics-function relationships of proteins. Results from this work have several important applications in structural bioinformatics such as structure prediction, molecular docking, protein engineering and design. In particular, the new KBFs developed in this dissertation have immediate applications in emerging topics such as prediction of 3D structure from coevolving residues in sequence alignments as well as in identifying the phenotypic effects of mutants

    Using evolutionary covariance to infer protein sequence-structure relationships

    Get PDF
    During the last half century, a deep knowledge of the actions of proteins has emerged from a broad range of experimental and computational methods. This means that there are now many opportunities for understanding how the varieties of proteins affect larger scale behaviors of organisms, in terms of phenotypes and diseases. It is broadly acknowledged that sequence, structure and dynamics are the three essential components for understanding proteins. Learning about the relationships among protein sequence, structure and dynamics becomes one of the most important steps for understanding the mechanisms of proteins. Together with the rapid growth in the efficiency of computers, there has been a commensurate growth in the sizes of the public databases for proteins. The field of computational biology has undergone a paradigm shift from investigating single proteins to looking collectively at sets of related proteins and broadly across all proteins. we develop a novel approach that combines the structure knowledge from the PDB, the CATH database with sequence information from the Pfam database by using co-evolution in sequences to achieve the following goals: (a) Collection of co-evolution information on the large scale by using protein domain family data; (b) Development of novel amino acid substitution matrices based on the structural information incorporated; (c) Higher order co-evolution correlation detection. The results presented here show that important gains can come from improvements to the sequence matching. What has been done here is simple and the pair correlations in sequence have been decomposed into singlet terms, which amounts to discarding much of the correlation information itself. The gains shown here are encouraging, and we would like to develop a sequence matching method that retains the pair (or higher order) correlation information, and even higher order correlations directly, and this should be possible by developing the sequence matching separately for different domain structures. The many body correlations in particular have the potential to transform the common perceptions in biology from pairs that are not actually so very informative to higher-order interactions. Fully understanding cellular processes will require a large body of higher-order correlation information such as has been initiated here for single proteins

    Allosteric Regulation at the Crossroads of New Technologies: Multiscale Modeling, Networks, and Machine Learning

    Get PDF
    Allosteric regulation is a common mechanism employed by complex biomolecular systems for regulation of activity and adaptability in the cellular environment, serving as an effective molecular tool for cellular communication. As an intrinsic but elusive property, allostery is a ubiquitous phenomenon where binding or disturbing of a distal site in a protein can functionally control its activity and is considered as the “second secret of life.” The fundamental biological importance and complexity of these processes require a multi-faceted platform of synergistically integrated approaches for prediction and characterization of allosteric functional states, atomistic reconstruction of allosteric regulatory mechanisms and discovery of allosteric modulators. The unifying theme and overarching goal of allosteric regulation studies in recent years have been integration between emerging experiment and computational approaches and technologies to advance quantitative characterization of allosteric mechanisms in proteins. Despite significant advances, the quantitative characterization and reliable prediction of functional allosteric states, interactions, and mechanisms continue to present highly challenging problems in the field. In this review, we discuss simulation-based multiscale approaches, experiment-informed Markovian models, and network modeling of allostery and information-theoretical approaches that can describe the thermodynamics and hierarchy allosteric states and the molecular basis of allosteric mechanisms. The wealth of structural and functional information along with diversity and complexity of allosteric mechanisms in therapeutically important protein families have provided a well-suited platform for development of data-driven research strategies. Data-centric integration of chemistry, biology and computer science using artificial intelligence technologies has gained a significant momentum and at the forefront of many cross-disciplinary efforts. We discuss new developments in the machine learning field and the emergence of deep learning and deep reinforcement learning applications in modeling of molecular mechanisms and allosteric proteins. The experiment-guided integrated approaches empowered by recent advances in multiscale modeling, network science, and machine learning can lead to more reliable prediction of allosteric regulatory mechanisms and discovery of allosteric modulators for therapeutically important protein targets

    Application and Optimization of Contact-Guided Replica Exchange Molecular Dynamics

    Get PDF
    Proteine sind komplexe Makromoleküle, die in lebenden Organismen eine große Vielfalt an wichtigen Aufgaben erfüllen. Proteine können beispielsweise Gene regulieren, Struktur stabilisieren, Zellsignale übertragen, Substanzen transportieren und vieles mehr. Typischerweise sind umfassende Kenntnisse von Struktur und Dynamik eines Proteins erforderlich um dessen physiologische Funktion und Interaktionsmechanismen vollständig zu verstehen. Gewonnene Erkenntnisse sind für Biowissenschaften unerlässlich und können auf viele Bereiche angewendet werden, wie z.B. für Arzneimitteldesign oder zur Krankheitsbehandlung. Trotz des unfassbaren Fortschritts experimenteller Techniken bleibt die Bestimmung einer Proteinstruktur immer noch eine herausfordernde Aufgabe. Außerdem können Experimente nur Teilinformationen liefern und Messdaten können mehrdeutig und schwer zu interpretieren sein. Aus diesem Grund werden häufig Computersimulationen durchgeführt um weitere Erkenntnisse zu liefern und die Lücke zwischen Theorie und Experiment zu schließen. Heute sind viele in-silico Methoden in der Lage genaue Protein Strukturmodelle zu erzeugen, sei es mit einem de novo Ansatz oder durch Verbesserung eines anfänglichen Modells unter Berücksichtigung experimenteller Daten. In dieser Dissertation erforsche ich die Möglichkeiten von Replica Exchange Molekulardynamik (REX MD) als ein physikbasierter Ansatz zur Erzeugung von physikalisch sinnvollen Proteinstrukturen. Dabei lege ich den Fokus darauf möglichst nativähnliche Strukturen zu erhalten und untersuche die Stärken und Schwächen der angewendeten Methode. Ich erweitere die Standardanwendung, indem ich ein kontaktbasiertes Bias-Potential integriere um die Leistung und das Endergebnis von REX zu verbessern. Die Einbeziehung nativer Kontaktpaare, die sowohl aus theoretischen als auch aus experimentellen Quellen abgeleitet werden können, treibt die Simulation in Richtung gewünschter Konformationen und reduziert dementsprechend den notwendigen Rechenaufwand. Während meiner Arbeit führte ich mehrere Studien durch mit dem Ziel, die Anreicherung von nativ-ähnlichen Strukturen zu maximieren, wodurch der End-to-End Prozess von geleitetem REX MD optimiert wird. Jede Studie zielt darauf ab wichtige Aspekte der verwendeten Methode zu untersuchen und zu verbessern: 1) Ich studiere die Auswirkungen verschiedener Auswahlen von Bias-Kontakten, insbesondere die Reichweitenabhängigkeit und den negativen Einfluss von fehlerhaften Kontakten. Dadurch kann ich ermitteln, welche Art von Bias zu einer signifikanten Anreicherung von nativ-ähnlichen Konformationen führen im Vergleich zu regulärem REX. 2) Ich führe eine Parameteroptimierung am verwendeten Bias-Potential durch. Der Vergleich von Ergebnissen aus REX-Simulationen unter Verwendung unterschiedlicher sigmoidförmiger Potentiale weist mir sinnvolle Parameter Bereiche auf, wodurch ich ein ideales Bias-Potenzial für den allgemeinen Anwendungsfall ableiten kann. 3) Ich stelle eine de novo Faltungsmethode vor, die möglichst schnell viele einzigartige Startstrukturen für REX generieren kann. Dabei untersuche ich ausführlich die Leistung dieser Methode und vergleiche zwei verschiedene Ansätze zur Auswahl der Startstruktur. Das Ergebnis von REX wird stark verbessert, falls Strukturen bereits zu Beginn eine große Bandbreite des Konformationsraumes abdecken und gleichzeitig eine geringe Distanz zum angestrebten Zustand aufweisen. 4) Ich untersuche vier komplexe Algorithmusketten, die in der Lage sind repräsentative Strukturen aus großen biomolekularen Ensembles zu extrahieren, welche durch REX erzeugt wurden. Dabei studiere ich ihre Robustheit und Zuverlässigkeit, vergleiche sie miteinander und bewerte ihre erbrachte Leistung numerisch. 5) Basierend auf meiner Erfahrung mit geleitetem REX MD habe ich ein Python-Paket entwickelt um REX-Projekte zu automatisieren und zu vereinfachen. Es ermöglicht einem Benutzer das Entwerfen, Ausführen, Analysieren und Visualisieren eines REX-Projektes in einer interaktiven und benutzerfreundlichen Umgebung

    Computational methods to design biophysical experiments for the study of protein dynamics

    Get PDF
    In recent years, new software and automated instruments have enabled us to imagine autonomous or "self-driving" laboratories of the future. However, ways to design new scientific studies remain unexplored due to challenges such as minimizing associated time, labor, and expense of sample preparation and data acquisition. In the field of protein biophysics, computational simulations such as molecular dynamics and spectroscopy-based experiments such as double electron-electron resonance and Fluorescence resonance energy transfer techniques have emerged as critical experimental tools to capture protein dynamic behavior, a change in protein structure as a function of time which is important for their cellular functions. These techniques can lead to the characterization of key protein conformations and can capture protein motions over a diverse range of timescales. This work addresses the problem of the choice of probe positions in a protein, which residue-pairs should experimentalists choose for spectroscopy experiments. For this purpose, molecular dynamics simulations and Markov state models of protein conformational dynamics are utilized to rank sets of labeled residue-pairs in terms of their ability to capture the conformational dynamics of the protein. The applications of our experimental study design methodology called OptimalProbes on different types of proteins and experimental techniques are examined. In order to utilize this method for a previously uncharacterized protein, atomistic molecular dynamics simulations are performed to study a bacterial di/tri-peptide transporter a typical representative of the Major Facilitator Superfamily of membrane proteins. This was followed by ideal double electron-electron resonance experimental choice predictions based on the simulation data. The predicted choices are superior to the residue-pair choices made by experimentalists which failed to capture the slowest dynamical processes in the conformational ensemble obtained from our long timescale simulations. For molecular dynamics simulations based design of experimental studies to succeed both ensembles need to be comparable. Since this has not been the case for double electron-electron resonance distance distributions and molecular simulations, we explore possible reasons that can lead to mismatches between experiments and simulations in order to reconcile simulated ensembles with experimentally obtained distance traces. This work is one of the first studies towards integrating spectroscopy experiment design into a computational method systematically based on molecular simulations

    Computational Study on the Protein Conformational Transitions and Their Pathways

    Get PDF
    A large number of protein structures have had their structures determined. However, there is still little information about their dynamics and functions. It is widely accepted that protein functionality is usually accompanied by conformational changes, and that there exists an ensemble of many structures of a given protein with different conformational states, yet the underlying mechanisms for the transitions between these states are still unclear. Here we take a novel approach and investigate such transitions by applying forces to the structures, originating from the exothermic reactions such as ATP hydrolysis. We use the directed force application approach as well as the Metropolis Monte Carlo force application simulations within the framework of elastic network models (ENMs). The directed force application reveals the existence of strongly preferred directions of forces that can drive a protein structure towards its known end state. When forces are applied more randomly with the Metropolis Monte Carlo method, the initial form is usually able to pass over energy barriers toward the target form and can finally achieve RMSDs of around 4 ÃÂ, matching the resolution level of the coarse-grained models themselves. Our free energy landscape agrees with the concept that native structures mostly fall in low energy regions. These landscapes are generated by computing the free energies by sampling conformations interpolated from known experimental structures, and are able in this way to suggest possible conformational transition pathways. The transitions by the application of forces are then projected onto the landscapes, and it is observed that they follow relatively low energy pathways that are overall energetically favorable. The comparison of these conformational transitions with the ENM normal modes for GroEL demonstrates that these transitions correspond well to following the low frequency modes

    Allostery of the flavivirus NS3 helicase and bacterial IGPS studied with molecular dynamics simulations

    Get PDF
    2020 Spring.Includes bibliographical references.Allostery is a biochemical phenomenon where the binding of a molecule at one site in a biological macromolecule (e.g. a protein) results in a perturbation of activity or function at another distinct active site in the macromolecule's structure. Allosteric mechanisms are seen throughout biology and play important functions during cell signaling, enzyme activation, and metabolism regulation as well as genome transcription and replication processes. Biochemical studies have identified allosteric effects for numerous proteins, yet our understanding of the molecular mechanisms underlying allostery is still lacking. Molecular-level insights obtained from all-atom molecular dynamics simulations can drive our understanding and further experimentation on the allosteric mechanisms at play in a protein. This dissertation reports three such studies of allostery using molecular dynamics simulations in conjunction with other methods. Specifically, the first chapter introduces allostery and how computational simulation of proteins can provide insight into the mechanisms of allosteric enzymes. The second and third chapters are foundational studies of the flavivirus non-structural 3 (NS3) helicase. This enzyme hydrolyzes nucleoside triphosphate molecules to power the translocation of the enzyme along single-stranded RNA as well as the unwinding of double-stranded RNA; both the hydrolysis and helicase functions (translocation and unwinding) have allosteric mechanisms where the hydrolysis active site's ligand affects the protein-RNA interactions and bound RNA enhances the hydrolysis activity. Specifically, a bound RNA oligomer is seen to affect the behavior and positioning of waters within the hydrolysis active site, which is hypothesized to originate, in part, from the RNA-dependent conformational states of the RNA-binding loop. Additionally, the substrate states of the NTP hydrolysis reaction cycle are seen to affect protein-RNA interactions, which is hypothesized to drive unidirectional translocation of the enzyme along the RNA polymer. Finally, chapter four introduces a novel method to study the biophysical coupling between two active sites in a protein. The short-ranged residue-residue interactions within the protein's three dimensional structure are used to identify paths that connect the two active sites. This method is used to highlight the paths and residue-residue interactions that are important to the allosteric enhancement observed for the Thermatoga maritima imidazole glycerol phosphate synthase (IGPS) protein. Results from this new quantitative analysis have provided novel insights into the allosteric paths of IGPS. For both the NS3 and IGPS proteins, results presented in this dissertation have highlighted structural regions that may be targeted for small-molecule inhibition or mutagenesis studies. Towards this end, the future studies of both allosteric proteins as well as broader impacts of the presented research are discussed in the final chapter

    Neural upscaling from residue-level protein structure networks to atomistic structures

    Get PDF
    Coarse-graining is a powerful tool for extending the reach of dynamic models of proteins and other biological macromolecules. Topological coarse-graining, in which biomolecules or sets thereof are represented via graph structures, is a particularly useful way of obtaining highly com-pressed representations of molecular structures, and simulations operating via such representations can achieve substantial computational savings. A drawback of coarse-graining, however, is the loss of atomistic detail—an effect that is especially acute for topological representations such as protein structure networks (PSNs). Here, we introduce an approach based on a combination of machine learning and physically-guided refinement for inferring atomic coordinates from PSNs. This “neural upscaling” procedure exploits the constraints implied by PSNs on possible configurations, as well as differences in the likelihood of observing different configurations with the same PSN. Using a 1 µs atomistic molecular dynamics trajectory of Aβ1–40, we show that neural upscaling is able to effectively recapitulate detailed structural information for intrinsically disordered proteins, being particularly successful in recovering features such as transient secondary structure. These results suggest that scalable network-based models for protein structure and dynamics may be used in settings where atomistic detail is desired, with upscaling employed to impute atomic coordinates from PSNs
    corecore