1,011 research outputs found

    Evaluation of methods for predicting the topology of β-barrel outer membrane proteins and a consensus prediction method

    Get PDF
    BACKGROUND: Prediction of the transmembrane strands and topology of β-barrel outer membrane proteins is of interest in current bioinformatics research. Several methods have been applied so far for this task, utilizing different algorithmic techniques and a number of freely available predictors exist. The methods can be grossly divided to those based on Hidden Markov Models (HMMs), on Neural Networks (NNs) and on Support Vector Machines (SVMs). In this work, we compare the different available methods for topology prediction of β-barrel outer membrane proteins. We evaluate their performance on a non-redundant dataset of 20 β-barrel outer membrane proteins of gram-negative bacteria, with structures known at atomic resolution. Also, we describe, for the first time, an effective way to combine the individual predictors, at will, to a single consensus prediction method. RESULTS: We assess the statistical significance of the performance of each prediction scheme and conclude that Hidden Markov Model based methods, HMM-B2TMR, ProfTMB and PRED-TMBB, are currently the best predictors, according to either the per-residue accuracy, the segments overlap measure (SOV) or the total number of proteins with correctly predicted topologies in the test set. Furthermore, we show that the available predictors perform better when only transmembrane β-barrel domains are used for prediction, rather than the precursor full-length sequences, even though the HMM-based predictors are not influenced significantly. The consensus prediction method performs significantly better than each individual available predictor, since it increases the accuracy up to 4% regarding SOV and up to 15% in correctly predicted topologies. CONCLUSIONS: The consensus prediction method described in this work, optimizes the predicted topology with a dynamic programming algorithm and is implemented in a web-based application freely available to non-commercial users at

    Ranking models of transmembrane β-barrel proteins using Z-coordinate predictions

    Get PDF
    Motivation: Transmembrane β-barrels exist in the outer membrane of gram-negative bacteria as well as in chloroplast and mitochondria. They are often involved in transport processes and are promising antimicrobial drug targets. Structures of only a few β-barrel protein families are known. Therefore, a method that could automatically generate such models would be valuable. The symmetrical arrangement of the barrels suggests that an approach based on idealized geometries may be successful

    Structural predictions for the ligand-binding region of glycoprotein hormone receptors and the nature of hormone–receptor interactions

    Get PDF
    AbstractBackground: Glycoprotein hormones influence the development and function of the ovary, testis and thyroid by binding to specific high-affinity receptors. The extracellular domains of these receptors are members of the leucine-rich repeat (LRR) protein superfamily and are responsible for the high-affinity binding. The crystal structure of a glycoprotein hormone, namely human choriogonadotropin (hCG), is known, but neither the receptor structure, mode of hormone binding, nor mechanism for activation, have been established.Results Despite very low sequence similarity between exon-demarcated LRRs in the receptors and the LRRs of porcine ribonuclease inhibitor (RI), the secondary structures for the two repeat sets are found to be alike. Constraints on curvature and β-barrel geometry from the sequence pattern for repeated βα units suggest that the receptors contain three-dimensional structures similar to that of RI. With the RI crystal structure as a template, models were constructed for exons 2–8 of the receptors. The model for this portion of the choriogonadotropin receptor is complementary in shape and electrostatic characteristics to the surface of hCG at an identified focus of hormone–receptor interaction.Conclusion The predicted models for the structures and mode of hormone binding of the glycoprotein hormone receptors are to a large extent consistent with currently available biochemical and mutational data. Repeated sequences in β-barrel proteins are shown to have general implications for constraints on structure. Averaging techniques used here to recognize the structural motif in these receptors should also apply to other proteins with repeated sequences

    Membrane protein orientation and refinement using a knowledge-based statistical potential.

    Get PDF
    Background: Recent increases in the number of deposited membrane protein crystal structures necessitate the use of automated computational tools to position them within the lipid bilayer. Identifying the correct orientation allows us to study the complex relationship between sequence, structure and the lipid environment, which is otherwise challenging to investigate using experimental techniques due to the difficulty in crystallising membrane proteins embedded within intact membranes. Results: We have developed a knowledge-based membrane potential, calculated by the statistical analysis of transmembrane protein structures, coupled with a combination of genetic and direct search algorithms, and demonstrate its use in positioning proteins in membranes, refinement of membrane protein models and in decoy discrimination. Conclusions: Our method is able to quickly and accurately orientate both alpha-helical and beta-barrel membrane proteins within the lipid bilayer, showing closer agreement with experimentally determined values than existing approaches. We also demonstrate both consistent and significant refinement of membrane protein models and the effective discrimination between native and decoy structures. Source code is available under an open source license from http://bioinf.cs.ucl.ac.uk/downloads/memembed/ webcite

    transFold: a web server for predicting the structure and residue contacts of transmembrane beta-barrels

    Get PDF
    Transmembrane β-barrel (TMB) proteins are embedded in the outer membrane of Gram-negative bacteria, mitochondria and chloroplasts. The cellular location and functional diversity of β-barrel outer membrane proteins makes them an important protein class. At the present time, very few non-homologous TMB structures have been determined by X-ray diffraction because of the experimental difficulty encountered in crystallizing transmembrane (TM) proteins. The transFold web server uses pairwise inter-strand residue statistical potentials derived from globular (non-outer-membrane) proteins to predict the supersecondary structure of TMB. Unlike all previous approaches, transFold does not use machine learning methods such as hidden Markov models or neural networks; instead, transFold employs multi-tape S-attribute grammars to describe all potential conformations, and then applies dynamic programming to determine the global minimum energy supersecondary structure. The transFold web server not only predicts secondary structure and TMB topology, but is the only method which additionally predicts the side-chain orientation of transmembrane β-strand residues, inter-strand residue contacts and TM β-strand inclination with respect to the membrane. The program transFold currently outperforms all other methods for accuracy of β-barrel structure prediction. Available at

    Structural genomics target selection for the New York consortium on membrane protein structure

    Get PDF
    The New York Consortium on Membrane Protein Structure (NYCOMPS), a part of the Protein Structure Initiative (PSI) in the USA, has as its mission to establish a high-throughput pipeline for determination of novel integral membrane protein structures. Here we describe our current target selection protocol, which applies structural genomics approaches informed by the collective experience of our team of investigators. We first extract all annotated proteins from our reagent genomes, i.e. the 96 fully sequenced prokaryotic genomes from which we clone DNA. We filter this initial pool of sequences and obtain a list of valid targets. NYCOMPS defines valid targets as those that, among other features, have at least two predicted transmembrane helices, no predicted long disordered regions and, except for community nominated targets, no significant sequence similarity in the predicted transmembrane region to any known protein structure. Proteins that feed our experimental pipeline are selected by defining a protein seed and searching the set of all valid targets for proteins that are likely to have a transmembrane region structurally similar to that of the seed. We require sequence similarity aligning at least half of the predicted transmembrane region of seed and target. Seeds are selected according to their feasibility and/or biological interest, and they include both centrally selected targets and community nominated targets. As of December 2008, over 6,000 targets have been selected and are currently being processed by the experimental pipeline. We discuss how our target list may impact structural coverage of the membrane protein space

    Structural approaches to protein sequence analysis

    Get PDF
    Various protein sequence analysis techniques are described, aimed at improving the prediction of protein structure by means of pattern matching. To investigate the possibility that improvements in amino acid comparison matrices could result in improvements in the sensitivity and accuracy of protein sequence alignments, a method for rapidly calculating amino acid mutation data matrices from large sequence data sets is presented. The method is then applied to the membrane-spanning segments of integral membrane proteins in order to investigate the nature of amino acid mutability in a lipid environment. Whilst purely sequence analytic techniques work well for cases where some residual sequence similarity remains between a newly characterized protein and a protein of known 3-D structure, in the harder cases, there is little or no sequence similarity with which to recognize proteins with similar folding patterns. In the light of these limitations, a new approach to protein fold recognition is described, which uses a statistically derived pairwise potential to evaluate the compatibility between a test sequence and a library of structural templates, derived from solved crystal structures. The method, which is called optimal sequence threading, proves to be highly successful, and is able to detect the common TIM barrel fold between a number of enzyme sequences, which has not been achieved by any previous sequence analysis technique. Finally, a new method for the prediction of the secondary structure and topology of membrane proteins is described. The method employs a set of statistical tables compiled from well-characterized membrane protein data, and a novel dynamic programming algorithm to recognize membrane topology models by expectation maximization. The statistical tables show definite biases towards certain amino acid species on the inside, middle and outside of a cellular membrane

    Transmembrane protein structure prediction using machine learning

    Get PDF
    This thesis describes the development and application of machine learning-based methods for the prediction of alpha-helical transmembrane protein structure from sequence alone. It is divided into six chapters. Chapter 1 provides an introduction to membrane structure and dynamics, membrane protein classes and families, and membrane protein structure prediction. Chapter 2 describes a topological study of the transmembrane protein CLN3 using a consensus of bioinformatic approaches constrained by experimental data. Mutations in CLN3 can cause juvenile neuronal ceroid lipofuscinosis, or Batten disease, an inherited neurodegenerative lysosomal storage disease affecting children, therefore such studies are important for directing further experimental work into this incurable illness. Chapter 3 explores the possibility of using biologically meaningful signatures described as regular expressions to influence the assignment of inside and outside loop locations during transmembrane topology prediction. Using this approach, it was possilbe to modify a recent topology prediction method leading to an improvement of 6% prediction accuracy using a standard data set. Chapter 4 describes the development of a novel support vector machine-based topology predictor that integrates both signal peptide and re-entrant helix prediction, benchmarked with full cross-validation on a novel data set of sequences with known crystal structures. The method achieves state-of-the-art performance in predicting topology and discriminating between globular and transmembrane proteins. We also present the results of applying these tools to a number of complete genomes. Chapter 5 describes a novel approach to predict lipid exposure, residue contacts, helix-helix interactions and finally the optimal helical packing arrangement of transmembrane proteins. It is based on two support vector machine classifiers that predict per residue lipid exposure and residue contacts, which are used to determine helix-helix interaction with up to 65% accuracy. The method is also able to discriminate native from decoy helical packing arrangements with up to 70% accuracy. Finally, a force-directed algorithm is employed to construct the optimal helical packing arrangement which demonstrates success for proteins containing up to 13 transmembrane helices. The final chapter summarises the major contributions of this thesis to biology, before future perspectives for TM protein structure prediction are discussed

    Conformational changes during pore formation by the perforin-related protein pleurotolysin

    Get PDF
    Membrane attack complex/perforin-like (MACPF) proteins comprise the largest superfamily of pore-forming proteins, playing crucial roles in immunity and pathogenesis. Soluble monomers assemble into large transmembrane pores via conformational transitions that remain to be structurally and mechanistically characterised. Here we present an 11 Å resolution cryo-electron microscopy (cryo-EM) structure of the two-part, fungal toxin Pleurotolysin (Ply), together with crystal structures of both components (the lipid binding PlyA protein and the pore-forming MACPF component PlyB). These data reveal a 13-fold pore 80 Å in diameter and 100 Å in height, with each subunit comprised of a PlyB molecule atop a membrane bound dimer of PlyA. The resolution of the EM map, together with biophysical and computational experiments, allowed confident assignment of subdomains in a MACPF pore assembly. The major conformational changes in PlyB are a ~70° opening of the bent and distorted central β-sheet of the MACPF domain, accompanied by extrusion and refolding of two α-helical regions into transmembrane β-hairpins (TMH1 and TMH2). We determined the structures of three different disulphide bond-trapped prepore intermediates. Analysis of these data by molecular modelling and flexible fitting allows us to generate a potential trajectory of β-sheet unbending. The results suggest that MACPF conformational change is triggered through disruption of the interface between a conserved helix-turn-helix motif and the top of TMH2. Following their release we propose that the transmembrane regions assemble into β-hairpins via top down zippering of backbone hydrogen bonds to form the membrane-inserted β-barrel. The intermediate structures of the MACPF domain during refolding into the β-barrel pore establish a structural paradigm for the transition from soluble monomer to pore, which may be conserved across the whole superfamily. The TMH2 region is critical for the release of both TMH clusters, suggesting why this region is targeted by endogenous inhibitors of MACPF function

    Classification and Automatic Annotation of Tandem Repeat Proteins in RepeatsDB

    Get PDF
    Protein tandem repeats are crucial structural elements in various biological processes, playing essential roles in cell adhesion, protein-protein interactions, and molecular recognition. These repetitive regions have sparked considerable interest in structural biology and bioinformatics, leading to the development of specialized resources like RepeatsDB. RepeatsDB is a comprehensive, curated database of annotated tandem repeat protein structures, offering a valuable resource for researchers. In this study, we systematically analyzed protein tandem repeats in RepeatsDB, with a primary focus on Alpha-Solenoids and Beta-Propellers, to enhance the existing classification system and provide a more profound understanding of protein tandem repeats. Our investigation commenced with an initial statistical analysis to elucidate the diversity and population status of distinct repeat groups within the database, as well as their respective degree of annotation. This approach proved instrumental in addressing the challenges associated with numerous entries that had a missing annotation. We conducted a structural analysis using pairwise structural alignment and explored dimensionality reduction and visualization techniques to uncover novel structural relationships. These findings improved our understanding of protein structural comparisons and informed a refined classification system. We utilized the density-based clustering algorithm, DBSCAN, to establish structural similarity ranges for Clan members and provide computational support for defining Clan boundaries. This method proved effective in detecting outlier entries and refining existing clans, leading to the proposal of new repeat groups. Additionally, we implemented a supervised classification experiment using the K-Nearest Neighbors (KNN) algorithm, which facilitated the automatic annotation of previously unannotated entries. This study introduces an automatic annotation methodology that significantly improves the performance of RepeatsDB curators and can be extended to other bioinformatics applications. The findings contribute to a more comprehensive understanding of protein tandem repeats and offer valuable insights for future research in structural biology and bioinformatics.Abstract Protein tandem repeats are crucial structural elements in various biological processes, playing essential roles in cell adhesion, protein-protein interactions, and molecular recognition. These repetitive regions have sparked considerable interest in structural biology and bioinformatics, leading to the development of specialized resources like RepeatsDB. RepeatsDB is a comprehensive, curated database of annotated tandem repeat protein structures, offering a valuable resource for researchers. In this study, we systematically analyzed protein tandem repeats in RepeatsDB, with a primary focus on Alpha-Solenoids and Beta-Propellers, to enhance the existing classification system and provide a more profound understanding of protein tandem repeats. Our investigation commenced with an initial statistical analysis to elucidate the diversity and population status of distinct repeat groups within the database, as well as their respective degree of annotation. This approach proved instrumental in addressing the challenges associated with numerous entries that had a missing annotation. We conducted a structural analysis using pairwise structural alignment and explored dimensionality reduction and visualization techniques to uncover novel structural relationships. These findings improved our understanding of protein structural comparisons and informed a refined classification system. We utilized the density-based clustering algorithm, DBSCAN, to establish structural similarity ranges for Clan members and provide computational support for defining Clan boundaries. This method proved effective in detecting outlier entries and refining existing clans, leading to the proposal of new repeat groups. Additionally, we implemented a supervised classification experiment using the K-Nearest Neighbors (KNN) algorithm, which facilitated the automatic annotation of previously unannotated entries. This study introduces an automatic annotation methodology that significantly improves the performance of RepeatsDB curators and can be extended to other bioinformatics applications. The findings contribute to a more comprehensive understanding of protein tandem repeats and offer valuable insights for future research in structural biology and bioinformatics
    corecore