40 research outputs found

    Traditional Biomolecular Structure Determination by NMR Spectroscopy Allows for Major Errors

    Get PDF
    One of the major goals of structural genomics projects is to determine the three-dimensional structure of representative members of as many different fold families as possible. Comparative modeling is expected to fill the remaining gaps by providing structural models of homologs of the experimentally determined proteins. However, for such an approach to be successful it is essential that the quality of the experimentally determined structures is adequate. In an attempt to build a homology model for the protein dynein light chain 2A (DLC2A) we found two potential templates, both experimentally determined nuclear magnetic resonance (NMR) structures originating from structural genomics efforts. Despite their high sequence identity (96%), the folds of the two structures are markedly different. This urged us to perform in-depth analyses of both structure ensembles and the deposited experimental data, the results of which clearly identify one of the two models as largely incorrect. Next, we analyzed the quality of a large set of recent NMR-derived structure ensembles originating from both structural genomics projects and individual structure determination groups. Unfortunately, a visual inspection of structures exhibiting lower quality scores than DLC2A reveals that the seriously flawed DLC2A structure is not an isolated incident. Overall, our results illustrate that the quality of NMR structures cannot be reliably evaluated using only traditional experimental input data and overall quality indicators as a reference and clearly demonstrate the urgent need for a tight integration of more sophisticated structure validation tools in NMR structure determination projects. In contrast to common methodologies where structures are typically evaluated as a whole, such tools should preferentially operate on a per-residue basis

    In Silico Veritas: The Pitfalls and Challenges of Predicting

    Get PDF
    Recently the first community-wide assessments of the prediction of the structures of complexes between proteins and small molecule ligands have been reported in the so-called GPCR Dock 2008 and 2010 assessments. In the current review we discuss the different steps along the protein-ligand modeling workflow by critically analyzing the modeling strategies we used to predict the structures of protein-ligand complexes we submitted to the recent GPCR Dock 2010 challenge. These representative test cases, focusing on the pharmaceutically relevant G Protein-Coupled Receptors, are used to demonstrate the strengths and challenges of the different modeling methods. Our analysis indicates that the proper performance of the sequence alignment, introduction of structural adjustments guided by experimental data, and the usage of experimental data to identify protein-ligand interactions are critical steps in the protein-ligand modeling protocol. © 2011 by the authors; licensee MDPI, Basel, Switzerland

    Orientation-dependent backbone-only residue pair scoring functions for fixed backbone protein design

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Empirical scoring functions have proven useful in protein structure modeling. Most such scoring functions depend on protein side chain conformations. However, backbone-only scoring functions do not require computationally intensive structure optimization and so are well suited to protein design, which requires fast score evaluation. Furthermore, scoring functions that account for the distinctive relative position and orientation preferences of residue pairs are expected to be more accurate than those that depend only on the separation distance.</p> <p>Results</p> <p>Residue pair scoring functions for fixed backbone protein design were derived using only backbone geometry. Unlike previous studies that used spherical harmonics to fit 2D angular distributions, Gaussian Mixture Models were used to fit the full 3D (position only) and 6D (position and orientation) distributions of residue pairs. The performance of the 1D (residue separation only), 3D, and 6D scoring functions were compared by their ability to identify correct threading solutions for a non-redundant benchmark set of protein backbone structures. The threading accuracy was found to steadily increase with increasing dimension, with the 6D scoring function achieving the highest accuracy. Furthermore, the 3D and 6D scoring functions were shown to outperform side chain-dependent empirical potentials from three other studies. Next, two computational methods that take advantage of the speed and pairwise form of these new backbone-only scoring functions were investigated. The first is a procedure that exploits available sequence data by averaging scores over threading solutions for homologs. This was evaluated by applying it to the challenging problem of identifying interacting transmembrane alpha-helices and found to further improve prediction accuracy. The second is a protein design method for determining the optimal sequence for a backbone structure by applying Belief Propagation optimization using the 6D scoring functions. The sensitivity of this method to backbone structure perturbations was compared with that of fixed-backbone all-atom modeling by determining the similarities between optimal sequences for two different backbone structures within the same protein family. The results showed that the design method using 6D scoring functions was more robust to small variations in backbone structure than the all-atom design method.</p> <p>Conclusions</p> <p>Backbone-only residue pair scoring functions that account for all six relative degrees of freedom are the most accurate and including the scores of homologs further improves the accuracy in threading applications. The 6D scoring function outperformed several side chain-dependent potentials while avoiding time-consuming and error prone side chain structure prediction. These scoring functions are particularly useful as an initial filter in protein design problems before applying all-atom modeling.</p

    Homology modelling and spectroscopy, a never-ending love story

    Get PDF
    Homology modelling is normally the technique of choice when experimental structure data are not available but three-dimensional coordinates are needed, for example, to aid with detailed interpretation of results of spectroscopic studies. Herein, the state of the art of homology modelling will be described in the light of a series of recent developments, and an overview will be given of the problems and opportunities encountered in this field. The major topic, the accuracy and precision of homology models, will be discussed extensively due to its influence on the reliability of conclusions drawn from the combination of homology models and spectroscopic data. Three real-world examples will illustrate how both homology modelling and spectroscopy can be beneficial for (bio)medical research

    Data from: Evolution and diversification of the organellar release factor family

    No full text
    Translation termination is accomplished by proteins of the Class I release factor family (RF) that recognize stop codons and catalyze the ribosomal release of the newly synthesized peptide. Bacteria have two canonical RFs: RF1 recognizes UAA and UAG, RF2 recognizes UAA and UGA. Despite that these 2 release factor proteins are sufficient for de facto translation termination, the eukaryotic organellar RF protein family, which has evolved from bacterial release factors, has expanded considerably, comprising multiple subfamilies, most of which have not been functionally characterized or formally classified. Here we integrate multiple sources of information to analyze the remarkable differentiation of the RF family among organelles. We document the origin, phylogenetic distribution and sequence structure features of the mitochondrial and plastidial release factors: mtRF1a, mtRF1, mtRF2a, mtRF2b, mtRF2c, ICT1, C12orf65, pRF1 and pRF2, and review published relevant experimental data. The canonical release factors (mtRF1a, mtRF2a, pRF1 and pRF2) and ICT1 are derived from bacterial ancestors, while the others have resulted from gene duplications of another release factor. These new RF family members have all lost one or more specific motifs relevant for bona fide release factor function but are mostly targeted to the same organelle as their ancestor. We also characterize the subset of canonical release factor proteins that bear non-classical PxT/SPF tripeptide motifs, and provide a molecular-model-based rationale for their retained ability to recognize stop codons. Finally we analyze the co-evolution of canonical RFs with the organellar genetic code. Although the RF presence in an organelle and its stop codon usage tend to co-evolve, we find three taxa that encode an RF2 without using UGA stop codons, and one reverse scenario, where mamiellales green algae use UGA stop codons in their mitochondria without having a mitochondrial type RF2. For the latter we put forward a “stop-codon re-invention” hypothesis that involves the retargeting of the plastid release factor to the mitochondrion

    RF2 subfamily alignment (mtrf2a, mtrf2b, mtrf2c, prf2)

    No full text
    Fasta multiple sequence alignment of Release Factor 2 proteins. Sequences were retrieved from GenBank nr database using PSI-BLAST. The alignment was obtained with ClustalW (v2.0.10), and 60% of the gaps was removed with BMGE (v1.0). The alignment was visually inspected and manually adjusted to accuratly align all functionally characterized domains and sequence motifs. Each sequence header contains the species name, the identity of each protein and, for bacterial species, the bacterial group they belong to. The numbers present in the header are not meaningful

    RF1 subfamily alignment (mtRF1a, mtRF1 and pRF1)

    No full text
    Fasta multiple sequence alignment of Release Factor 1 proteins. Sequences were retrieved from GenBank nr database using PSI-BLAST. The alignment was obtained with ClustalW (v2.0.10), and 60% of the gaps was removed with BMGE (v1.0). The alignment was visually inspected and manually adjusted to accuratly align all functionally characterized domains and sequence motifs. Fasta multiple sequence alignment of Release Factor 2 proteins. Sequences were retrieved from GenBank nr database using PSI-BLAST. The alignment was obtained with ClustalW (v2.0.10), and 60% of the gaps was removed with BMGE (v1.0). The alignment was visually inspected and manually adjusted to accuratly align all functionally characterized domains and sequence motifs. Each sequence header contains the species name, the identity of each protein and, for bacterial species, the bacterial group they belong to. The numbers present in the header are not meaningful

    <i>[I<sub>uni</sub>,I<sub>ave</sub>]</i> Plot for 1TGQ Calculated Using the QUEEN Program [36]

    No full text
    <p>Long-range restraints (blue filled circles) and the 1TGQ<sub>sim</sub> restraints (red filled circles) are indicated. Restraints that are among the 30 most unique and most important (those above the dashed gray line) and that involve residues in either the α2 or β3 region (cf. <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020009#pcbi-0020009-g001" target="_blank">Figure 1</a>A) are indicated by black boxes.</p

    Five Different per-Residue Structural Quality Indicators

    No full text
    <div><p>(A) Packing quality <i>Z</i>-score.</p><p>(B) Ramachandran plot appearance <i>Z</i>-score.</p><p>(C) Rotamer normality <i>Z</i>-score.</p><p>(D) Backbone normality score. The values listed on the <i>y</i>-axis indicate the number of times the local backbone (defined by the current residue plus or minus two residues) was found in WHAT IF's internal database (with a cut-off on the number of hits at 80).</p><p>(E) Sum of the NOE violations. Scores for the refined 1Y4O ensemble are shown in green; those for the refined 1TGQ ensemble are shown in orange. Secondary structure of the 1Y4O ensemble is indicated using colored boxes: α-helices are shown in blue, β-strands are shown in red.</p></div

    Sequence and Structure Ensembles of Two DLC2A Structures

    No full text
    <div><p>(A) The sequence of human DLC2A (hDLC2A) (AA).</p><p>(B) The sequence of mouse DLC2A (mDLC2A) proceeded by an eight-residue His-tag (AA). The secondary structure as predicted using PSIPRED [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020009#pcbi-0020009-b033" target="_blank">33</a>,<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020009#pcbi-0020009-b050" target="_blank">50</a>] (Pred) and the confidence of this prediction (Conf) are shown above the sequences. The secondary structure as observed in the ensembles (Obs) is indicated below the sequences. Except for the His-Tag, the mouse and human sequences differ at three positions (indicated in bold).</p><p>(C) Ribbon diagram of the structure ensemble of mDLC2A (PDB entry 1Y4O). The residues of the His-tag have been omitted for clarity.</p><p>(D) Ribbon diagram of the structure ensemble of hDLC2A (PDB entry 1TGQ).</p><p>(E) The refined average structure of the ensemble calculated using the reconstructed 1TGQ dataset, as discussed in the text. Secondary structure is indicated using colors: helices are shown in blue and purple, strands are shown in red and orange. A numbering scheme for the secondary structure elements is indicated between the two sequences.</p></div
    corecore