36 research outputs found

    SABBAC: online Structural Alphabet-based protein BackBone reconstruction from Alpha-Carbon trace

    Get PDF
    SABBAC is an on-line service devoted to protein backbone reconstruction from alpha-carbon trace. It is based on the assembly of fragments taken from a library of reduced size, selected from the encoding of the protein trace in a hidden Markov model-derived structural alphabet. The assembly of the fragments is achieved by a greedy algorithm, using an energy-based scoring. Alpha-carbon coordinates remain unaffected. SABBAC simply positions the missing backbone atoms, no further refinement is performed. From our tests, SABBAC performs equal or better than other similar on-line approach and is robust to deviations on the alpha-carbon coordinates. It can be accessed at

    New approaches to protein docking

    Get PDF
    In the first part of this work, we propose new methods for protein docking. First, we present two approaches to protein docking with flexible side chains. The first approach is a fast greedy heuristic, while the second is a branch -&-cut algorithm that yields optimal solutions. For a test set of protease-inhibitor complexes, both approaches correctly predict the true complex structure. Another problem in protein docking is the prediction of the binding free energy, which is the the final step of many protein docking algorithms. Therefore, we propose a new approach that avoids the expensive and difficult calculation of the binding free energy and, instead, employs a scoring function that is based on the similarity of the proton nuclear magnetic resonance spectra of the tentative complexes with the experimental spectrum. Using this method, we could even predict the structure of a very difficult protein-peptide complex that could not be solved using any energy-based scoring functions. The second part of this work presents BALL (Biochemical ALgorithms Library), a framework for Rapid Application Development in the field of Molecular Modeling. BALL provides an extensive set of data structures as well as classes for Molecular Mechanics, advanced solvation methods, comparison and analysis of protein structures, file import/export, NMR shift prediction, and visualization. BALL has been carefully designed to be robust, easy to use, and open to extensions. Especially its extensibility, which results from an object-oriented and generic programming approach, distinguishes it from other software packages.Der erste Teil dieser Arbeit beschĂ€ftigt sich mit neuen AnsĂ€tzen zum Proteindocking. ZunĂ€chst stellen wir zwei AnsĂ€tze zum Proteindocking mit flexiblen Seitenketten vor. Der erste Ansatz beruht auf einer schnellen, gierigen Heuristik, wĂ€hrend der zweite Ansatz auf branch-&-cut-Techniken beruht und das Problem optimal lösen kann. Beide AnsĂ€tze sind in der Lage die korrekte Komplexstruktur fĂŒr einen Satz von Testbeispielen (bestehend aus Protease-Inhibitor-Komplexen) vorherzusagen. Ein weiteres, grösstenteils ungelöstes, Problem ist der letzte Schritt vieler Protein-Docking-Algorithmen, die Vorhersage der freien Bindungsenthalpie. Daher schlagen wir eine neue Methode vor, die die schwierige und aufwĂ€ndige Berechnung der freien Bindungsenthalpie vermeidet. Statt dessen wird eine Bewertungsfunktion eingesetzt, die auf der Ähnlichkeit der Protonen-Kernresonanzspektren der potentiellen Komplexstrukturen mit dem experimentellen Spektrum beruht. Mit dieser Methode konnten wir sogar die korrekte Struktur eines Protein-Peptid-Komplexes vorhersagen, an dessen Vorhersage energiebasierte Bewertungsfunktionen scheitern. Der zweite Teil der Arbeit stellt BALL (Biochemical ALgorithms Library) vor, ein Rahmenwerk zur schnellen Anwendungsentwicklung im Bereich MolecularModeling. BALL stellt eine Vielzahl von Datenstrukturen und Algorithmen fĂŒr die FelderMolekĂŒlmechanik,Vergleich und Analyse von Proteinstrukturen, Datei-Import und -Export, NMR-Shiftvorhersage und Visualisierung zur VerfĂŒgung. Beim Entwurf von BALL wurde auf Robustheit, einfache Benutzbarkeit und Erweiterbarkeit Wert gelegt. Von existierenden Software-Paketen hebt es sich vor allem durch seine Erweiterbarkeit ab, die auf der konsequenten Anwendung von objektorientierter und generischer Programmierung beruht

    New approaches to protein docking

    Get PDF
    In the first part of this work, we propose new methods for protein docking. First, we present two approaches to protein docking with flexible side chains. The first approach is a fast greedy heuristic, while the second is a branch -&-cut algorithm that yields optimal solutions. For a test set of protease-inhibitor complexes, both approaches correctly predict the true complex structure. Another problem in protein docking is the prediction of the binding free energy, which is the the final step of many protein docking algorithms. Therefore, we propose a new approach that avoids the expensive and difficult calculation of the binding free energy and, instead, employs a scoring function that is based on the similarity of the proton nuclear magnetic resonance spectra of the tentative complexes with the experimental spectrum. Using this method, we could even predict the structure of a very difficult protein-peptide complex that could not be solved using any energy-based scoring functions. The second part of this work presents BALL (Biochemical ALgorithms Library), a framework for Rapid Application Development in the field of Molecular Modeling. BALL provides an extensive set of data structures as well as classes for Molecular Mechanics, advanced solvation methods, comparison and analysis of protein structures, file import/export, NMR shift prediction, and visualization. BALL has been carefully designed to be robust, easy to use, and open to extensions. Especially its extensibility, which results from an object-oriented and generic programming approach, distinguishes it from other software packages.Der erste Teil dieser Arbeit beschĂ€ftigt sich mit neuen AnsĂ€tzen zum Proteindocking. ZunĂ€chst stellen wir zwei AnsĂ€tze zum Proteindocking mit flexiblen Seitenketten vor. Der erste Ansatz beruht auf einer schnellen, gierigen Heuristik, wĂ€hrend der zweite Ansatz auf branch-&-cut-Techniken beruht und das Problem optimal lösen kann. Beide AnsĂ€tze sind in der Lage die korrekte Komplexstruktur fĂŒr einen Satz von Testbeispielen (bestehend aus Protease-Inhibitor-Komplexen) vorherzusagen. Ein weiteres, grösstenteils ungelöstes, Problem ist der letzte Schritt vieler Protein-Docking-Algorithmen, die Vorhersage der freien Bindungsenthalpie. Daher schlagen wir eine neue Methode vor, die die schwierige und aufwĂ€ndige Berechnung der freien Bindungsenthalpie vermeidet. Statt dessen wird eine Bewertungsfunktion eingesetzt, die auf der Ähnlichkeit der Protonen-Kernresonanzspektren der potentiellen Komplexstrukturen mit dem experimentellen Spektrum beruht. Mit dieser Methode konnten wir sogar die korrekte Struktur eines Protein-Peptid-Komplexes vorhersagen, an dessen Vorhersage energiebasierte Bewertungsfunktionen scheitern. Der zweite Teil der Arbeit stellt BALL (Biochemical ALgorithms Library) vor, ein Rahmenwerk zur schnellen Anwendungsentwicklung im Bereich MolecularModeling. BALL stellt eine Vielzahl von Datenstrukturen und Algorithmen fĂŒr die FelderMolekĂŒlmechanik,Vergleich und Analyse von Proteinstrukturen, Datei-Import und -Export, NMR-Shiftvorhersage und Visualisierung zur VerfĂŒgung. Beim Entwurf von BALL wurde auf Robustheit, einfache Benutzbarkeit und Erweiterbarkeit Wert gelegt. Von existierenden Software-Paketen hebt es sich vor allem durch seine Erweiterbarkeit ab, die auf der konsequenten Anwendung von objektorientierter und generischer Programmierung beruht

    Exploration of the Disambiguation of Amino Acid Types to Chi-1 Rotamer Types in Protein Structure Prediction and Design

    Full text link
    A protein’s global fold provide insight into function; however, function specificity is often detailed in sidechain orientation. Thus, determining the rotamer conformations is often crucial in the contexts of protein structure/function prediction and design. For all non-glycine and non-alanine types, chi-1 rotamers occupy a small number of discrete number of states. Herein, we explore the possibility of describing evolution from the perspective of the sidechains’ structure versus the traditional twenty amino acid types. To validate our hypothesis that this perspective is more crucial to our understanding of evolutionary relationships, we investigate its uses as evolutionary, substitution matrices for sequence alignments for fold recognition purposes and computational protein design with specific focus in designing beta sheet environments, where previous studies have been done on amino acid-types alone. Throughout this study, we also propose the concept of the “chi-1 rotamer sequence” that describes the chi-1 rotamer composition of a protein. We also present attempts to predict these sequences and real-value torsion angles from amino acid sequence information. First, we describe our developments of log-odds scoring matrices for sequence alignments. Log-odds substitution matrices are widely used in sequence alignments for their ability to determine evolutionary relationship between proteins. Traditionally, databases of sequence information guide the construction of these matrices which illustrates its power in discovering distant or weak homologs. Weak homologs, typically those that share low sequence identity (< 30%), are often difficult to identify when only using basic amino acid sequence alignment. While protein threading approaches have addressed this issue, many of these approaches include sequenced-based information or profiles guided by amino acid-based substitution matrices, namely BLOSUM62. Here, we generated a structural-based substitution matrix born by TM-align structural alignments that captures both the sequence mutation rate within same protein family folds and the chi-1 rotamer that represents each amino acid. These rotamer substitution matrices (ROTSUMs) discover new homologs and improved alignments in the PDB that traditional substitution matrices, based solely on sequence information, cannot identify. Certain tools and algorithms to estimate rotamer torsions angles have been developed but typically require either knowledge of backbone coordinates and/or experimental data to help guide the prediction. Herein, we developed a fragment-based algorithm, Rot1Pred, to determine the chi-1 states in each position of a given amino acid sequence, yielding a chi-1 rotamer sequence. This approach employs fragment matching of the query sequence to sequence-structure fragment pairs in the PDB to predict the query’s sidechain structure information. Real-value torsion angles were also predicted and compared against SCWRL4. Results show that overall and for most amino-acid types, Rot1Pred can calculate chi-1 torsion angles significantly closer to native angles compared to SCWRL4 when evaluated on I-TASSER generated model backbones. Finally, we’ve developed and explored chi-1-rotamer-based statistical potentials and evolutionary profiles constructed for de novo computational protein design. Previous analyses which aim to energetically describe the preference of amino acid types in beta sheet environments (parallel vs antiparallel packing or n- and c-terminal beta strand capping) have been performed with amino acid types although no explicit rotamer representation is given in their scoring functions. In our study, we construct statistical functions which describes chi-1 rotamer preferences in these environments and illustrate their improvement over previous methods. These specialized knowledge-based energy functions have generated sequences whose I-TASSER predicted models are structurally-alike to their input structures yet consist of low sequence identity.PHDChemical BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145951/1/jarrettj_1.pd

    Sequence Determinants of the Individual and Collective Behaviour of Intrinsically Disordered Proteins

    Get PDF
    Intrinsically disordered proteins and protein regions (IDPs) represent around thirty percent of the eukaryotic proteome. IDPs do not fold into a set three dimensional structure, but instead exist in an ensemble of inter-converting states. Despite being disordered, IDPs are decidedly not random; well-defined - albeit transient - local and long-range interactions give rise to an ensemble with distinct statistical biases over many length-scales. Among a variety of cellular roles, IDPs drive and modulate the formation of phase separated intracellular condensates, non-stoichiometric assemblies of protein and nucleic acid that serve many functions. In this work, we have explored how the amino acid sequence of IDPs determines their conformational behaviour, and how sequence and single chain behaviour influence their collective behaviour in the context of phase separation. In part I, in a series of studies, we used simulation, theory, and statistical analysis coupled with a wide range of experimental approaches to uncover novel rules that further explore how primary sequence and local structure influence the global and local behaviour of disordered proteins, with direct implications for protein function and evolution. We found that amino acid sidechains counteract the intrinsic collapse of the peptide backbone, priming the backbone for interaction and providing a fully reconciliatory explanation for the mechanism of action associated with the denaturants urea and GdmCl. We discovered that proline can engender a conformational buffering effect in IDPs to counteract standard electrostatic effects, and that the patterning those proline residues can be a crucial determinant of the conformational ensemble. We developed a series of tools for analysing primary sequences on a proteome wide scale and used them to discover that different organisms can have substantially different average sequence properties. Finally, we determined that for the normally folded protein NTL9, the unfolded state under folding conditions is relatively expanded but has well defined native and non-native structural preferences. In part II, we identified a novel mode of phase separation in biology, and explored how this could be tuned through sequence design. We discovered that phase separated liquids can be many orders of magnitude more dilute than simple mean-field theories would predict, and developed an analytic framework to explain and understand this phenomenon. Finally, we designed, developed and implemented a novel lattice-based simulation engine (PIMMS) to provide sequence-specific insight into the determinants of conformational behaviour and phase separation. PIMMS allows us to accurately and rapidly generate sequence-specific conformational ensembles and run simulations of hundreds of polymers with the goal of allowing us to systematically elucidate the link between primary sequence of phase separation

    Computational Modeling and Design of Protein and Polymeric Nano-Assemblies

    Get PDF
    Advances in nanotechnology have the potential to utilize biological and polymeric systems to address fundamental scientific and societal issues, including molecular electronics and sensors, energy-relevant light harvesting, Ăą??greenĂą?? catalysis, and environmental cleanup. In many cases, synthesis and fabrication are well within grasp, but designing such systems requires simultaneous consideration of large numbers of degrees of freedom including structure, sequence, and functional properties. In the case of protein design, even simply considering amino acid identity scales exponentially with the protein length. This work utilizes computational techniques to develop a fundamental, molecularly detailed chemical and physical understanding to investigate and design such nano-assemblies. Throughout, we leverage a probabilistic computational design approach to guide the identification of protein sequences that fold to predetermined structures with targeted function. The statistical methodology is encapsulated in a computational design platform, recently reconstructed with improvements in speed and versatility, to estimate site-specific probabilities of residues through the optimization of an effective sequence free energy. This provides an information-rich perspective on the space of possible sequences which is able to harness the incorporation of new constraints that fit design objectives. The approach is applied to the design and modeling of protein systems incorporating non-biological cofactors, namely (i) an aggregation prone peptide assembly to bind uranyl and (ii) a protein construct to encapsulate a zinc porphyrin derivative with unique photo-physical properties. Additionally, molecular dynamics simulations are used to investigate purely synthetic assemblies of (iii) highly charged semiconducting polymers that wrap and disperse carbon nanotubes. Free energy calculations are used to explore the factors that lead to observed polymer-SWNT super-structures, elucidating well-defined helical structures; for chiral derivatives, the simulations corroborate a preference for helical handedness observed in TEM and AFM data. The techniques detailed herein, demonstrate how advances in computational chemistry allot greater control and specificity in the engineering of novel nano-materials and offer the potential to greatly advance applications of these systems

    Efficient algorithms in protein modelling

    Get PDF
    Proteins are key players in the complex world of living cells. No matter whether they are involved in enzymatic reactions, inter-cell communication or numerous other processes, knowledge of their structure is vital for a detailed understanding of their function. However, structure determination by experiment is often a laborious process that cannot keep up with the ever increasing pace of sequencing methodologies. As a consequence, the gap between proteins where we only know the sequence and the proteins where we additionally have detailed structural information is growing rapidly. Computational modelling methods that extrapolate structural information from homologous structures have established themselves as a valuable complement to experiment and help bridging this gap. This thesis addresses two key aspects in protein modelling. (1) It investigates and improves methodologies that assign reliability estimates to protein models, so called quality estimation (QE) methods. Even a human expert cannot immediately detect errors introduced in the modelling process, thus the importance of automated methods performing this task. (2) It assesses the available methods that perform the modelling itself, discusses solutions for current shortcomings and provides efficient implementations thereof. When detecting errors in protein models, many knowledge based methods are biased towards the physio-chemical properties observed in soluble protein structures. This limits their applicability for the important class of membrane protein models. In an effort to improve the situation, QMEANBrane has been developed. QMEANBrane is specifically designed to detect local errors in membrane protein models by membrane specific statistical potentials of mean force that nowadays approach statistical saturation given the increase of available experimental data. Considering the improvement of quality estimation for soluble proteins, instead of solely applying the widely used statistical potentials of mean force, QMEANDisCo incorporates the observed structural variety of experimentally determined protein structures homologous to the model being assessed. Valuable ensemble information can be gathered without the need of actually depending on a large ensemble of protein models, thus circumventing a main limitation of consensus QE methods. Apart from improving QE methods, in an effort of implementing and extending state-of-the-art modelling algorithms, the lack of a free and efficient modelling engine became obvious. No available modelling engine provided an open-source codebase as a basis for novel, innovative algorithms and, at the same time, had no restrictions for usage. This contradicts our efforts to make protein modelling available to all biochemists and molecular biologists worldwide. As a consequence we implemented a new free and open modelling engine from scratch - ProMod3. ProMod3 allows to combine extremely efficient, state-of-the-art modelling algorithms in a flexible manner to solve various modelling problems. To weaken the dogma of one template one model, basic algorithms have been explored to incorporate structural information from multiple templates into one protein model. The algorithms are built using ProMod3 and have extensively been tested in the context of the CAMEO continuous evaluation platform. The result is a highly competitive modelling pipeline that excels with extremely low runtimes and excellent performance

    Modeling the flexibility of alpha helices in protein interfaces : structure based design and prediction of helix-mediated protein-protein interactions

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Chemistry, 2008.Vita.Includes bibliographical references.Protein-protein interactions play an essential role in many biological functions. Prediction and design of these interactions using computational methods requires models that can be used to efficiently sample structural variation. This thesis identifies methods that can be used to sample an important sub-space of protein structure: alpha helices that participate in protein interfaces. Helices, the global structural properties of which can be described with only a few variables, are particularly well suited for efficient sampling. Two methods for sampling helical backbones are presented: Crick parameterization for coiled coils and normal-mode analysis for all helices. These are shown to capture most of the variation seen in the PDB. In addition, these methods are applied to problems in protein structure prediction and design. Normal-mode analysis is used to design novel nanomolar peptide inhibitors of the apoptosis-related Bcl-2 family member, Bcl-xL, and a modification of Crick Parameterization is used to predict the binding orientation of dimeric coiled coils with greater than 80% accuracy. Finally, this study addresses the increase in computational time required by flexible-backbone methods and the use of cluster expansion to quickly map structural energies to sequence-based functions for increased efficiency.by James R. Apgar.Ph.D

    Multiscale Molecular Dynamics Simulations of Histidine Kinase Activity

    Get PDF
    Zweikomponentensysteme (TCS), bestehend aus einer Sensorhistidinkinase (HK) und einem Antwortregulationsprotein, sind SchlĂŒsselbausteine in bakteriellen Signaluebertragungsmechanismen. Die FĂ€higkeit von Bakterien auf eine breite Vielfalt von chemischen und physikalischen Stimuli angemessen zu reagieren ist ausschlaggebend fĂŒr ihr Überleben. Es ist daher nicht ĂŒberraschend, dass TCS zu den meistuntersuchten bakteriellen Proteinsystemen gehört. Sensorhistidinkinasen sind typischerweise in die Zellmembran integrierte, homodimere Proteine bestehend aus mehreren DomĂ€nen. Reizwahrnehmung an der SensordomĂ€ne von HK löst eine Reihe von großskaligen KonformationsĂŒbergangen entlang der DomĂ€nen aus. WĂ€hrend sich die strukturellen Eigenschaften von verschiedenen HKs unterscheiden können, erhalten sie alle einen katalytischen ATP-bindenden Kern (CA) und dimerisierende HistidinphosphotransferdomĂ€nen (DHp). WĂ€hrend der Signalkaskade nimmt der Kern eine asymmetrische Konformation an, sodass die Kinase an einem der Protomere aktiv ist und die der anderen inaktiv. Das ermöglicht es dem ATP in einer der CA-DomĂ€nen seine Îł\gamma-Phosphatgruppe an das Histidin der DHp abzugeben. Diese Phosphorylgrouppe wird anschließend an das Antwortregulationsprotein weitergegeben, die eine angemessene Reaktion der Zelle veranlasst. In der vorliegenden Arbeit untersuche ich die Konformationsdynamik des Kinasekerns mithilfe von Molekulardynamiksimulationen (MD). Der Fokus der Arbeit liegt auf zwei verschiedenen HKs: WalK und CpxA. Wegen der GrĂ¶ĂŸe der Systeme und den erforderlichen biologischen Zeitskalen, ist es nicht möglich die relevanten KonformationsĂŒbergĂ€nge in klassischen MD-Simulationen zu berechnen. Um dieses Problem zu umgehen, verwende ich ein strukturbasiertes Modell mit paarweisen harmonischen Potentialen. Diese NĂ€herung erlaubt es, den Übergang zwischen dem inaktiven und den aktiven Zustand mit wesentlich geringerem rechnerischen Aufwand zu untersuchen. Nachdem ich das System mithilfe dieses vereinfachten Modells erkundet habe, benutze ich angereicherte Stichprobenverfahren mit atomistischen Modellen um detailliertere Einsichten in die Dynamik zu gewinnen. Die Ergebnisse in dieser Arbeit legen nahe, dass das Verhalten der einzelnen UnterdomĂ€nen des Kinasekerns eng miteinander gekoppelt ist
    corecore