145 research outputs found

    A virtual pebble game to ensemble average graph rigidity

    Get PDF
    Previous works have demonstrated that protein rigidity is related to thermodynamic stability, especially under conditions that favor formation of native structure. Mechanical network rigidity properties of a single conformation are efficiently calculated using the in- teger Pebble Game (PG) algorithm. However, thermodynamic properties require averaging over many samples from the ensemble of accessible conformations, leading to fluctuations within the network. We have developed a mean field Virtual Pebble Game (VPG) that provides a probabilistic description of the interaction network, meaning that sampling is not required. We extensively test the VPG algorithm over a variety of body-bar networks created on disordered lattices, from these calculations we fully characterize the network conditions under which the performance of the VPG offers the best solution. The VPG provides a satisfactory description of the ensemble averaged PG properties, especially in regions removed from the rigidity transition where ensemble fluctuations are greatest. In further experiments, we characterized the VPG across a structurally nonredundant dataset of 272 proteins. Using quantitative and visual assessments of the rigidity characterizations, the VPG results are shown to accurately reflect the ensemble averaged PG properties. That is, the fluctuating interaction network is well represented by a single calculation that re- places density functions with average values, thus speeding up the desired calculation by several orders of magnitude. Finally, we propose a new algorithm that is based on the combination of PG and VPG to balance the amount of sampling and mean field treatment. While offering interesting results, this approach needs to be further optimized to fully lever- age its utility. All these results positions the VPG as an efficient alternative to understand the mechanical role that chemical interactions play in maintaining protein stability

    Sublinear Computation Paradigm

    Get PDF
    This open access book gives an overview of cutting-edge work on a new paradigm called the “sublinear computation paradigm,” which was proposed in the large multiyear academic research project “Foundations of Innovative Algorithms for Big Data.” That project ran from October 2014 to March 2020, in Japan. To handle the unprecedented explosion of big data sets in research, industry, and other areas of society, there is an urgent need to develop novel methods and approaches for big data analysis. To meet this need, innovative changes in algorithm theory for big data are being pursued. For example, polynomial-time algorithms have thus far been regarded as “fast,” but if a quadratic-time algorithm is applied to a petabyte-scale or larger big data set, problems are encountered in terms of computational resources or running time. To deal with this critical computational and algorithmic bottleneck, linear, sublinear, and constant time algorithms are required. The sublinear computation paradigm is proposed here in order to support innovation in the big data era. A foundation of innovative algorithms has been created by developing computational procedures, data structures, and modelling techniques for big data. The project is organized into three teams that focus on sublinear algorithms, sublinear data structures, and sublinear modelling. The work has provided high-level academic research results of strong computational and algorithmic interest, which are presented in this book. The book consists of five parts: Part I, which consists of a single chapter on the concept of the sublinear computation paradigm; Parts II, III, and IV review results on sublinear algorithms, sublinear data structures, and sublinear modelling, respectively; Part V presents application results. The information presented here will inspire the researchers who work in the field of modern algorithms

    Computational Modeling and Design of Protein–Protein Interactions

    Get PDF
    Protein–protein interactions dictate biological functions, including ones essential to living organisms such as immune response or transcriptional regulation. To fundamentally understand these biological processes, we must understand the underlying interactions at the atomic scale. However interactions are overly abundant and traditional structure determination methods cannot manage a comprehensive study. Alternatively, computational methods can provide structural models with high-throughput overcoming the challenge provided by the sheer breadth of interactions, albeit at the cost of accuracy. Thus, it is necessary to improve modeling techniques if these approaches will be used to rigorously study protein–protein interactions. In this dissertation, I describe my advances to protein–protein interaction modeling (docking) methods in Rosetta. My advances are based on challenges encountered in a blind docking competition, including: modeling camelid antibodies, modeling flexible protein regions, and modeling solvated interfaces. First, I detail improvements to RosettaAntibody and Rosetta SnugDock, including making the underlying code more robust and easy to use, enabling new loop modeling methods, developing an automatically updating database, and implementing scientific benchmarks. These improvements permitted me to conduct the largest-to-date study of antibody CDR-H3 loop flexibility, which showed that traditional, small-scale studies missed emergent properties. Then, I pivot from antibodies to focus on the modeling of disordered protein regions. I contributed advances to the FloppyTail protocol, including enabling the modeling of multiple disordered regions within a single protein and pioneering an ensemble-based analysis of resultant models. I modeled Hfq proteins across six species of bacteria and demonstrated experimentally-validated prediction of interactions between disordered and ordered protein regions. My simulations provided a hypothetical mechanism for Hfq function. Finally, I designed crystallographic protein–protein interactions, with the goal of improving protein crystal resolution. To approach this exceptional challenge, I first demonstrated that, under homogenous conditions, Rosetta scores can correlate with crystal resolution. Next, I computationally designed and experimentally characterized sixteen variants of a model protein. Only five crystallized, with one providing an improvement in resolution, showing that improvement through computational design is challenging, but possible. In sum, my work advanced our understanding and our ability to model and design several challenging protein–protein interactions

    Development of a normal mode-based geometric simulation approach for investigating the intrinsic mobility of proteins

    Get PDF
    Specific functions of biological systems often require conformational transitions of macromolecules. Thus, being able to describe and predict conformational changes of biological macromolecules is not only important for understanding their impact on biological function, but will also have implications for the modelling of (macro)molecular complex formation and in structure-based drug design approaches. The “conformational selection model” provides the foundation for computational investigations of conformational fluctuations of the unbound protein state. These fluctuations may reveal conformational states adopted by the bound proteins. The aim of this work is to incorporate directional information in a geometry-based approach, in order to sample biologically relevant conformational space extensively. Interestingly, coarse-grained normal mode (CGNM) approaches, e.g., the elastic network model (ENM) and rigid cluster normal mode analysis (RCNMA), have emerged recently and provide directions of intrinsic motions in terms of harmonic modes (also called normal modes). In my previous work and in other studies it has been shown that conformational changes upon ligand binding occur along a few low-energy modes of unbound proteins and can be efficiently calculated by CGNM approaches. In order to explore the validity and the applicability of CGNM approaches, a large-scale comparison of essential dynamics (ED) modes from molecular dynamics (MD) simulations and normal modes from CGNM was performed over a dataset of 335 proteins. Despite high coarse-graining, low frequency normal modes from CGNM correlate very well with ED modes in terms of directions of motions (average maximal overlap is 0.65) and relative amplitudes of motions (average maximal overlap is 0.73). In order to exploit the potential of CGNM approaches, I have developed a three-step approach for efficient exploration of intrinsic motions of proteins. The first two steps are based on recent developments in rigidity and elastic network theory. Initially, static properties of the protein are determined by decomposing the protein into rigid clusters using the graph-theoretical approach FIRST at an all-atom representation of the protein. In a second step, dynamic properties of the molecule are revealed by the rotations-translations of blocks approach (RTB) using an elastic network model representation of the coarse-grained protein. In the final step, the recently introduced idea of constrained geometric simulations of diffusive motions in proteins is extended for efficient sampling of conformational space. Here, the low-energy (frequency) normal modes provided by the RCNMA approach are used to guide the backbone motions. The NMSim approach was validated on hen egg white lysozyme by comparing it to previously mentioned simulation methods in terms of residue fluctuations, conformational space explorations, essential dynamics, sampling of side-chain rotamers, and structural quality. Residue fluctuations in NMSim generated ensemble is found to be in good agreement with MD fluctuations with a correlation coefficient of around 0.79. A comparison of different geometry-based simulation approaches shows that FRODA is restricted in sampling the backbone conformational space. CONCOORD is restricted in sampling the side-chain conformational space. NMSim sufficiently samples both the backbone and the side-chain conformations taking experimental structures and conformations from the state of the art MD simulation as reference. The NMSim approach is also applied to a dataset of proteins where conformational changes have been observed experimentally, either in domain or functionally important loop regions. The NMSim simulations starting from the unbound structures are able to reach conformations similar to ligand bound conformations (RMSD 0.7) between the RMS fluctuations derived from NMSim generated structures and two experimental structures are observed. Furthermore, intrinsic fluctuations in NMSim simulation correlate with the region of loop conformational changes observed upon ligand binding in 2 out of 3 cases. The NMSim generated pathway of conformational change from the unbound structure to the ligand bound structure of adenylate kinase is validated by a comparison to experimental structures reflecting different states of the pathway as proposed by previous studies. Interestingly, the generated pathway confirms that the LID domain closure precedes the closing of the NMPbind domain, even if no target conformation is provided in NMSim. Hence, the results in this study show that, incorporating directional information in the geometry-based approach NMSim improves the sampling of biologically relevant conformational space and provides a computationally efficient alternative to state of the art MD simulations.Konformationsänderungen von Proteinen sind häufig eine grundlegende Voraussetzung für deren biologische Funktion. Die genaue Charakterisierung und Vorhersage dieser Konformationsänderungen ist für das Verständnis ihres Einflusses auf die Funktion erforderlich. Eines der dafür am häufigsten verwendeten und genauesten computergestützten Verfahren ist die Molekulardynamik-Simulationen (MD Simulationen). Diese sind jedoch nach wie vor sehr rechenintensiv und durchmustern den Konformationsraum nur in begrenztem Maße. Daher wurden Anstrengungen unternommen, alternative geometriebasierte Methoden (wie etwa CONCOORD oder FRODA) zu entwickeln, die auf einer reduzierten Darstellung von Proteinen beruhen. Das Ziel dieser Arbeit ist es, Richtungsinformationen in einen geometriebasierten Ansatz zu integrieren, und so den biologisch relevanten Konformationsraum erschöpfend zu durchmustern. Diese Idee führte kürzlich zur Entwicklung von „coarse-grained normal mode“ (CGNM) Methoden, wie zum Beispiel dem „elastic network model“ (ENM) und der von mir in vorangegangenen Arbeiten entwickelte „rigid cluster normal mode analysis“ (RCNMA). Beide Methoden liefern die gewünschte Richtungsinformation der intrinsischen Bewegungen eines Proteins in Form von harmonischen Moden (auch Normalmoden). Um die Aussagekraft, Robustheit und breite Anwendbarkeit solcher CGNM Verfahren zu untersuchen, wurde im Rahmen dieser Dissertation ein umfangreicher Vergleich zwischen „essential dynamics“ (ED) Moden aus MD Simulationen und Normalmoden aus CGNM Berechnungen durchgeführt. Der zugrundeliegende Datensatz enthielt 335 Proteine. Obwohl die CGNM Verfahren eine stark vereinfachte Darstellung für Proteine verwenden, korrelieren die niederfrequenten Moden dieser Verfahren bezüglich ihrer Bewegungs-Richtung (durchschnittliche maximale Überschneidung: 0,65) und -Amplitude (durchschnittliche maximale Überschneidung: 0,73) sehr gut mit ED Moden. Im Durchschnitt beschreibt das erste Viertel der Normalmoden 85 % des Raumes, der durch die ersten fünf ED Moden aufgespannt wird. Um die Leistungsfähigkeit von CGNM Verfahren genauer zu bestimmen, wurde im Rahmen der vorliegenden Studie eine dreistufige Methode zur Untersuchung der intrinsischen Dynamik von Proteinen entwickelt. Die ersten beiden Stufen basieren auf neusten Entwicklungen in der Rigiditäts-Theorie und der Beschreibung von elastischen Netzwerken. Diese sind im RCNMA Ansatz verwirklich und ermöglichen die Bestimmung der Normalmoden. Im letzten Schritt werden die Bewegungen des Proteinrückgrates entlang der mittels RCNMA erzeugten niederenergetischen Normalmoden ausgerichtet. Die Seitenkettenkonformrationen werden dabei durch Diffusionsbewegungen hin zu energetisch günstigen Rotameren erzeugt. Dies ist ein iterativer Prozess, bestehend aus mehreren kleineren Schritten, in denen jeweils intermediäre Konformationen erzeugt werden. Zur Validierung des NMSim Ansatzes wurde dieser mit den anderen zuvor genannten Simulationsmethoden am Beispiel von Lysozym verglichen. Die Fluktuationen der Aminosäurereste aus dem mit NMSim erzeugten Ensemble stimmen mit berechneten Fluktuationen aus der MD Simulation gut überein (Korrelationskoeffizient R = 0,79). Ein Vergleich der unterschiedlichen geometriebasierten Simulationsansätze zeigt, dass bei FRODA die Durchmusterung des Konformationsraumes des Proteinrückrates unzureichend ist. Bei CONCOORD ist hingegen die Durchmusterung des Konformationsraumes der Seitenketten unzureichend. NMSim hingegen durchmustert sowohl den Konformationsraum des Proteinrückrates als auch den der Seitenketten angemessen, wenn man die experimentell und mittels MD Simulationen erzeugten Konformationen als Referenz verwendet. Der NMSim Ansatz wurde ebenfalls auf einen Datensatz von Proteinen angewendet, für die Konformationsänderungen in Domänen oder in funktionell wichtigen Schleifenregionen experimentell beobacht wurden. In Übereinstimmung mit dem Konformations-Selektions-Modell ist der NMSim Ansatz bei vier von fünf Proteinen, die eine Domänenbewegung aufweisen, in der Lage, ausgehend von der ungebundenen Struktur neue Konformationen zu erzeugen, die der ligandgebundenen Konformation entsprechen (RMSD 0,7) zwischen der RMS Fluktuation der durch NMSim erzeugten Konformationen und jeweils zwei experimentellen Strukturen erreicht. Hingegen korrelieren die intrinischen Fluktuationen der NMSim Simulation in zwei von drei Fällen mit dem Bereich der ligandinduzierten Konformationsänderung in den Schleifen. Der mit NMSim generierte Pfad für die Konformationsänderungen von der ungebundenen Struktur zur ligandgebundenen Struktur der Adenylat-Kinase wurde durch den Vergleich zu experimentellen Strukturen validiert, die verschiedene Zustände des Pfades widerspiegeln. Die unterschiedlichen Kristallstrukturen, die entlang der Konformationsänderungen von der ungebundenen zur ligandgebundenen Struktur liegen, werden auf dem von NMSim erzeugten Pfad durchmustert. Interessanterweise bestätigt der generierte Pfad, dass die Schließbewegung der LID Domäne derjenigen der NMPbind Domäne vorangeht, sogar wenn keine Zielkonformation für die NMSim Simulation verwendet wurde

    Program and Abstracts Celebration of Student Scholarship, 2015

    Get PDF
    Program and Abstracts from the Celebration of Student Scholarship on April 22, 2015

    2016 Abstracts Student Research Conference

    Get PDF
    • …
    corecore