343 research outputs found

    MolProbity: all-atom contacts and structure validation for proteins and nucleic acids

    Get PDF
    MolProbity is a general-purpose web server offering quality validation for 3D structures of proteins, nucleic acids and complexes. It provides detailed all-atom contact analysis of any steric problems within the molecules as well as updated dihedral-angle diagnostics, and it can calculate and display the H-bond and van der Waals contacts in the interfaces between components. An integral step in the process is the addition and full optimization of all hydrogen atoms, both polar and nonpolar. New analysis functions have been added for RNA, for interfaces, and for NMR ensembles. Additionally, both the web site and major component programs have been rewritten to improve speed, convenience, clarity and integration with other resources. MolProbity results are reported in multiple forms: as overall numeric scores, as lists or charts of local problems, as downloadable PDB and graphics files, and most notably as informative, manipulable 3D kinemage graphics shown online in the KiNG viewer. This service is available free to all users at http://molprobity.biochem.duke.edu

    Capturing Atomic Interactions with a Graphical Framework in Computational Protein Design

    Get PDF
    A protein's amino acid sequence determines both its chemical and its physical structures, and together these two structures determine its function. Protein designers seek new amino acid sequences with chemical and physical structures capable of performing some function. The vast size of sequence space frustrates efforts to find useful sequences. Protein designers model proteins on computers and search through amino acid sequence space computationally. They represent the three-dimensional structures for the sequences they examine, specifying the location of each atom, and evaluate the stability of these structures. Good structures are tightly packed but are free of collisions. Designers seek a sequence with a stable structure that meets the geometric and chemical requirements to function as desired; they frame their search as an optimization problem. In this dissertation, I present a graphical model of the central optimization problem in protein design, the side-chain-placement problem. This model allows the formulation of a dynamic programming solution, thus connecting side-chain placement with the class of NP-complete problems for which certain instances admit polynomial time solutions. Moreover, the graphical model suggests a natural data structure for storing the energies used in design. With this data structure, I have created an extensible framework for the representation of energies during side-chain-placement optimization and have incorporated this framework into the Rosetta molecular modeling program. I present one extension that incorporates a new degree of structural variability into the optimization process. I present another extension that includes a non-pairwise decomposable energy function, the first of its kind in protein design, laying the ground-work to capture aspects of protein stability that could not previously be incorporated into the optimization of side-chain placement

    New approaches to protein docking

    Get PDF
    In the first part of this work, we propose new methods for protein docking. First, we present two approaches to protein docking with flexible side chains. The first approach is a fast greedy heuristic, while the second is a branch -&-cut algorithm that yields optimal solutions. For a test set of protease-inhibitor complexes, both approaches correctly predict the true complex structure. Another problem in protein docking is the prediction of the binding free energy, which is the the final step of many protein docking algorithms. Therefore, we propose a new approach that avoids the expensive and difficult calculation of the binding free energy and, instead, employs a scoring function that is based on the similarity of the proton nuclear magnetic resonance spectra of the tentative complexes with the experimental spectrum. Using this method, we could even predict the structure of a very difficult protein-peptide complex that could not be solved using any energy-based scoring functions. The second part of this work presents BALL (Biochemical ALgorithms Library), a framework for Rapid Application Development in the field of Molecular Modeling. BALL provides an extensive set of data structures as well as classes for Molecular Mechanics, advanced solvation methods, comparison and analysis of protein structures, file import/export, NMR shift prediction, and visualization. BALL has been carefully designed to be robust, easy to use, and open to extensions. Especially its extensibility, which results from an object-oriented and generic programming approach, distinguishes it from other software packages.Der erste Teil dieser Arbeit beschäftigt sich mit neuen Ansätzen zum Proteindocking. Zunächst stellen wir zwei Ansätze zum Proteindocking mit flexiblen Seitenketten vor. Der erste Ansatz beruht auf einer schnellen, gierigen Heuristik, während der zweite Ansatz auf branch-&-cut-Techniken beruht und das Problem optimal lösen kann. Beide Ansätze sind in der Lage die korrekte Komplexstruktur für einen Satz von Testbeispielen (bestehend aus Protease-Inhibitor-Komplexen) vorherzusagen. Ein weiteres, grösstenteils ungelöstes, Problem ist der letzte Schritt vieler Protein-Docking-Algorithmen, die Vorhersage der freien Bindungsenthalpie. Daher schlagen wir eine neue Methode vor, die die schwierige und aufwändige Berechnung der freien Bindungsenthalpie vermeidet. Statt dessen wird eine Bewertungsfunktion eingesetzt, die auf der Ähnlichkeit der Protonen-Kernresonanzspektren der potentiellen Komplexstrukturen mit dem experimentellen Spektrum beruht. Mit dieser Methode konnten wir sogar die korrekte Struktur eines Protein-Peptid-Komplexes vorhersagen, an dessen Vorhersage energiebasierte Bewertungsfunktionen scheitern. Der zweite Teil der Arbeit stellt BALL (Biochemical ALgorithms Library) vor, ein Rahmenwerk zur schnellen Anwendungsentwicklung im Bereich MolecularModeling. BALL stellt eine Vielzahl von Datenstrukturen und Algorithmen für die FelderMolekülmechanik,Vergleich und Analyse von Proteinstrukturen, Datei-Import und -Export, NMR-Shiftvorhersage und Visualisierung zur Verfügung. Beim Entwurf von BALL wurde auf Robustheit, einfache Benutzbarkeit und Erweiterbarkeit Wert gelegt. Von existierenden Software-Paketen hebt es sich vor allem durch seine Erweiterbarkeit ab, die auf der konsequenten Anwendung von objektorientierter und generischer Programmierung beruht

    Euclidean distance geometry and applications

    Full text link
    Euclidean distance geometry is the study of Euclidean geometry based on the concept of distance. This is useful in several applications where the input data consists of an incomplete set of distances, and the output is a set of points in Euclidean space that realizes the given distances. We survey some of the theory of Euclidean distance geometry and some of the most important applications: molecular conformation, localization of sensor networks and statics.Comment: 64 pages, 21 figure

    Benchmarking pKa prediction

    Get PDF
    Background: pKa values are a measure of the protonation of ionizable groups in proteins. Ionizable groups are involved in intra-protein, protein-solvent and protein-ligand interactions as well as solubility, protein folding and catalytic activity. The pKa shift of a group from its intrinsic value is determined by the perturbation of the residue by the environment and can be calculated from three-dimensional structural data. Results: Here we use a large dataset of experimentally-determined pKas to analyse the performance of different prediction techniques. Our work provides a benchmark of available software implementations: MCCE, MEAD, PROPKA and UHBD. Combinatorial and regression analysis is also used in an attempt to find a consensus approach towards pKa prediction. The tendency of individual programs to over- or underpredict the pKa value is related to the underlying methodology of the individual programs. Conclusion: Overall, PROPKA is more accurate than the other three programs. Key to developing accurate predictive software will be a complete sampling of conformations accessible to protein structures

    Atomic-accuracy prediction of protein loop structures through an RNA-inspired ansatz

    Get PDF
    Consistently predicting biopolymer structure at atomic resolution from sequence alone remains a difficult problem, even for small sub-segments of large proteins. Such loop prediction challenges, which arise frequently in comparative modeling and protein design, can become intractable as loop lengths exceed 10 residues and if surrounding side-chain conformations are erased. This article introduces a modeling strategy based on a 'stepwise ansatz', recently developed for RNA modeling, which posits that any realistic all-atom molecular conformation can be built up by residue-by-residue stepwise enumeration. When harnessed to a dynamic-programming-like recursion in the Rosetta framework, the resulting stepwise assembly (SWA) protocol enables enumerative sampling of a 12 residue loop at a significant but achievable cost of thousands of CPU-hours. In a previously established benchmark, SWA recovers crystallographic conformations with sub-Angstrom accuracy for 19 of 20 loops, compared to 14 of 20 by KIC modeling with a comparable expenditure of computational power. Furthermore, SWA gives high accuracy results on an additional set of 15 loops highlighted in the biological literature for their irregularity or unusual length. Successes include cis-Pro touch turns, loops that pass through tunnels of other side-chains, and loops of lengths up to 24 residues. Remaining problem cases are traced to inaccuracies in the Rosetta all-atom energy function. In five additional blind tests, SWA achieves sub-Angstrom accuracy models, including the first such success in a protein/RNA binding interface, the YbxF/kink-turn interaction in the fourth RNA-puzzle competition. These results establish all-atom enumeration as a systematic approach to protein structure that can leverage high performance computing and physically realistic energy functions to more consistently achieve atomic resolution.Comment: Identity of four-loop blind test protein and parts of figures 5 have been omitted in this preprint to ensure confidentiality of the protein structure prior to its public releas

    A Generic Program for Multistate Protein Design

    Get PDF
    Some protein design tasks cannot be modeled by the traditional single state design strategy of finding a sequence that is optimal for a single fixed backbone. Such cases require multistate design, where a single sequence is threaded onto multiple backbones (states) and evaluated for its strengths and weaknesses on each backbone. For example, to design a protein that can switch between two specific conformations, it is necessary to to find a sequence that is compatible with both backbone conformations. We present in this paper a generic implementation of multistate design that is suited for a wide range of protein design tasks and demonstrate in silico its capabilities at two design tasks: one of redesigning an obligate homodimer into an obligate heterodimer such that the new monomers would not homodimerize, and one of redesigning a promiscuous interface to bind to only a single partner and to no longer bind the rest of its partners. Both tasks contained negative design in that multistate design was asked to find sequences that would produce high energies for several of the states being modeled. Success at negative design was assessed by computationally redocking the undesired protein-pair interactions; we found that multistate design's accuracy improved as the diversity of conformations for the undesired protein-pair interactions increased. The paper concludes with a discussion of the pitfalls of negative design, which has proven considerably more challenging than positive design

    A novel graph-based method for targeted ligand-protein fitting

    Get PDF
    A thesis submitted to the Faculty of Creative Arts, Technologies & Science, University of Bedfordshire, in partial & fulfilment of the requirements for the degree of Master of Philosophy.The determination of protein binding sites and ligand -protein fitting are key to understanding the functionality of proteins, from revealing which ligand classes can bind or the optimal ligand for a given protein, such as protein/ drug interactions. There is a need for novel generic computational approaches for representation of protein-ligand interactions and the subsequent prediction of hitherto unknown interactions in proteins where the ligand binding sites are experimentally uncharacterised. The TMSite algorithms read in existing PDB structural data and isolate binding sites regions and identifies conserved features in functionally related proteins (proteins that bind the same ligand). The Boundary Cubes method for surface representation was applied to the modified PDB file allowing the creation of graphs for proteins and ligands that could be compared and caused no loss of geometric data. A method is included for describing binding site features of individual ligands conserved in terms of spatial relationships allowed identification of 3D motifs, named fingerprints, which could be searched for in other protein structures. This method combine with a modification of the pocket algorithm allows reduced search areas for graph matching. The methods allow isolation of the binding site from a complexed protein PDB file, identification of conserved features among the binding sites of individual ligand types, and search for these features in sequence data. In terms of spatial conservation create a fingerprint ofthe binding site that can be sought in other proteins of/mown structure, identifYing putative binding sites. The approach offers a novel and generic method for the identification of putative ligand binding sites for proteins for which there is no prior detailed structural characterisation of protein/ ligand interactions. It is unique in being able to convert PDB data into graphs, ready for comparison and thus fitting of ligand to protein with consideration of chemical charge and in the future other chemica! properties

    Protein and Peptide Gas-phase Structure Investigation Using Collision Cross Section Measurements and Hydrogen Deuterium Exchange

    Get PDF
    Protein and peptide gas-phase structure analysis provides the opportunity to study these species outside of their explicit environment where the interaction network with surrounding molecules makes the analysis difficult [1]. Although gas-phase structure analysis offers a unique opportunity to study the intrinsic behavior of these biomolecules [2-4], proteins and peptides exhibit very low vapor pressures [2]. Peptide and protein ions can be rendered in the gas-phase using electrospray ionization (ESI) [5]. There is a growing body of literature that shows proteins and peptides can maintain solution structures during the process of ESI and these structures can persist for a few hundred milliseconds [6-9].;Techniques for monitoring gas-phase protein and peptide ion structures are categorized as physical probes and chemical probes. Collision cross section (CCS) measurement, being a physical probe, is a powerful method to investigate gas-phase structure size [3, 7, 10-15]; however, CCS values alone do not establish a one to one relation with structure(i.e., the CCS value is an orientationally averaged value [15-18]. Here we propose the utility of gas-phase hydrogen deuterium exchange (HDX) as a second criterion of structure elucidation. The proposed approach incudes extensive MD simulations to sample biomolecular ion conformation space with the production of numerous, random in-silico structures. Subsequently a CCS can be calculated for these structures and theoretical CCS values are compared with experimental values to produce a pool of candidate structures. Utilizing a chemical reaction model based on the gas-phase HDX mechanism, the HDX kinetics behavior of these candidate structures are predicted and compared to experimental results to nominate the best in-silico structures which match (chemically and physically) with experimental observations.;For the predictive approach to succeed, an extensive technique and method development is essential. To combine CCS measurements and gas-phase HDX studies at the amino acid residue level, for the first time a drift tube is connected to a linear ion trap (LIT) with electron transfer dissociation (ETD) capability[19, 20]. In this manner CCS and per-residue deuterium uptake measurements for a model peptide carried out successfully[19]. In this study, the gas-phase conformations of electrosprayed ions of the model peptide KKDDDDIIKIIK have been examined. Using ion structures obtained from molecular dynamics (MD) simulation and considering charge-site/exchange-site density the level of the maximum total deuterium uptake for the gas-phase ions is explained. Also a new hydrogen accessibility scoring (HAS) model that includes two distance calculations (charge site to carbonyl group and carbonyl group to exchange site) is applied to the in-silico structures to describe the expected HDX behavior for these structures. Further investigation to improve the accuracy of the model is accomplished by a per-residue HDX kinetics study of the model peptide [21]. In this study, the ion residence time and the deuterium uptake of each residue is measured at different partial pressures of D2O. Subsequently the contribution each residue to the overall HDX rate of the intact peptide ion is calculated. These rate contributions of the residues exhibit a better fit to HAS than their maximum deuterium uptake.;Proteins and peptides with very frequent acidic residue in their sequence provide very poor signal levels when employing positive polarity ESI. Also, the comparison of protonated and deprotonated ions of these biomolecules offers the potential to provide a better structural characterization [22]. Per-residue deuterium uptake values resulting from collision-induced dissociation (CID) of the model peptide KKDDDDIIKIIK were used to investigated the degree of hydrogen deuterium scrambling for deprotonated ions [23]. Remarkably, limited isotopic scrambling was observed in this study of this small model peptide. This data and the per-residue deuterium uptake of the triply-protonated model peptide Acetyl-PAAAAKAAAAKAAAAKAAAAK are exploited to propose a lemma to allocate protonation and deprotonation sites for peptide ions in the gas-phase. Insulin ions, as a small protein model system, are examined to investigate the relation of the maximum deuterium uptake value for each insulin chain to the exposed surface area of each insulin subunit [22]. The results show that the methodology can be applied on the protein complexes to provide information about the exposed surface area of each subunit
    corecore