3,848 research outputs found

    Rapid membrane protein topology prediction

    Get PDF
    Summary: State-of-the-art methods for topology of α-helical membrane proteins are based on the use of time-consuming multiple sequence alignments obtained from PSI-BLAST or other sources. Here, we examine if it is possible to use the consensus of topology prediction methods that are based on single sequences to obtain a similar accuracy as the more accurate multiple sequence-based methods. Here, we show that TOPCONS-single performs better than any of the other topology prediction methods tested here, but ∼6% worse than the best method that is utilizing multiple sequence alignments

    Folding and insertion thermodynamics of the transmembrane WALP peptide

    Get PDF
    The anchor of most integral membrane proteins consists of one or several helices spanning the lipid bilayer. The WALP peptide, GWW(LA)n_n(L)WWA, is a common model helix to study the fundamentals of protein insertion and folding, as well as helix-helix association in the membrane. Its structural properties have been illuminated in a large number of experimental and simulation studies. In this combined coarse-grained and atomistic simulation study, we probe the thermodynamics of a single WALP peptide, focusing on both the insertion across the water-membrane interface, as well as folding in both water and a membrane. The potential of mean force characterizing the peptide's insertion into the membrane shows qualitatively similar behavior across peptides and three force fields. However, the Martini force field exhibits a pronounced secondary minimum for an adsorbed interfacial state, which may even become the global minimum---in contrast to both atomistic simulations and the alternative PLUM force field. Even though the two coarse-grained models reproduce the free energy of insertion of individual amino acids side chains, they both underestimate its corresponding value for the full peptide (as compared with atomistic simulations), hinting at cooperative physics beyond the residue level. Folding of WALP in the two environments indicates the helix as the most stable structure, though with different relative stabilities and chain-length dependence.Comment: 12 pages, 5 figure

    Structural approaches to protein sequence analysis

    Get PDF
    Various protein sequence analysis techniques are described, aimed at improving the prediction of protein structure by means of pattern matching. To investigate the possibility that improvements in amino acid comparison matrices could result in improvements in the sensitivity and accuracy of protein sequence alignments, a method for rapidly calculating amino acid mutation data matrices from large sequence data sets is presented. The method is then applied to the membrane-spanning segments of integral membrane proteins in order to investigate the nature of amino acid mutability in a lipid environment. Whilst purely sequence analytic techniques work well for cases where some residual sequence similarity remains between a newly characterized protein and a protein of known 3-D structure, in the harder cases, there is little or no sequence similarity with which to recognize proteins with similar folding patterns. In the light of these limitations, a new approach to protein fold recognition is described, which uses a statistically derived pairwise potential to evaluate the compatibility between a test sequence and a library of structural templates, derived from solved crystal structures. The method, which is called optimal sequence threading, proves to be highly successful, and is able to detect the common TIM barrel fold between a number of enzyme sequences, which has not been achieved by any previous sequence analysis technique. Finally, a new method for the prediction of the secondary structure and topology of membrane proteins is described. The method employs a set of statistical tables compiled from well-characterized membrane protein data, and a novel dynamic programming algorithm to recognize membrane topology models by expectation maximization. The statistical tables show definite biases towards certain amino acid species on the inside, middle and outside of a cellular membrane

    Insertion and hairpin formation of membrane proteins: a Monte Carlo study

    Get PDF
    Some particular effects of a lipid membrane on the partitioning and the concomitant folding processes of model proteins have been investigated using Monte Carlo methods. It is observed that orientational order and lateral density fluctuations of the lipid matrix stabilize the orientation of helical proteins and induce a tendency of spontaneous formation of helical hairpins for helices longer than the width of the membrane. The lateral compression of the lipids on a hairpin leads to the extrusion of a loop at the trans side of the membrane. The stability of the hairpin can be increased by the design of appropriate groups of hydrophilic and hydrophobic residues at the extruded loop. It is shown that in the absence of lipids the orientation of proteins is not stable and the formation of hairpins is absent. Some analogies between the formation of helical hairpins in membranes and the formation of hairpins in polymer liquid crystals are discussed. The simulations indicate that the insertion process follows a well-defined pattern of kinetic steps

    CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources

    Get PDF
    International audienceBACKGROUND: The functions of proteins are strongly related to their localization in cell compartments (for example the cytoplasm or membranes) but the experimental determination of the sub-cellular localization of proteomes is laborious and expensive. A fast and low-cost alternative approach is in silico prediction, based on features of the protein primary sequences. However, biologists are confronted with a very large number of computational tools that use different methods that address various localization features with diverse specificities and sensitivities. As a result, exploiting these computer resources to predict protein localization accurately involves querying all tools and comparing every prediction output; this is a painstaking task. Therefore, we developed a comprehensive database, called CoBaltDB, that gathers all prediction outputs concerning complete prokaryotic proteomes. DESCRIPTION: The current version of CoBaltDB integrates the results of 43 localization predictors for 784 complete bacterial and archaeal proteomes (2.548.292 proteins in total). CoBaltDB supplies a simple user-friendly interface for retrieving and exploring relevant information about predicted features (such as signal peptide cleavage sites and transmembrane segments). Data are organized into three work-sets ("specialized tools", "meta-tools" and "additional tools"). The database can be queried using the organism name, a locus tag or a list of locus tags and may be browsed using numerous graphical and text displays. CONCLUSIONS: With its new functionalities, CoBaltDB is a novel powerful platform that provides easy access to the results of multiple localization tools and support for predicting prokaryotic protein localizations with higher confidence than previously possible. CoBaltDB is available at http://www.umr6026.univ-rennes1.fr/english/home/research/basic/software/cobalten

    A unified evolutionary origin for the ubiquitous protein transporters SecY and YidC.

    Get PDF
    BACKGROUND: Protein transporters translocate hydrophilic segments of polypeptide across hydrophobic cell membranes. Two protein transporters are ubiquitous and date back to the last universal common ancestor: SecY and YidC. SecY consists of two pseudosymmetric halves, which together form a membrane-spanning protein-conducting channel. YidC is an asymmetric molecule with a protein-conducting hydrophilic groove that partially spans the membrane. Although both transporters mediate insertion of membrane proteins with short translocated domains, only SecY transports secretory proteins and membrane proteins with long translocated domains. The evolutionary origins of these ancient and essential transporters are not known. RESULTS: The features conserved by the two halves of SecY indicate that their common ancestor was an antiparallel homodimeric channel. Structural searches with SecY's halves detect exceptional similarity with YidC homologs. The SecY halves and YidC share a fold comprising a three-helix bundle interrupted by a helical hairpin. In YidC, this hairpin is cytoplasmic and facilitates substrate delivery, whereas in SecY, it is transmembrane and forms the substrate-binding lateral gate helices. In both transporters, the three-helix bundle forms a protein-conducting hydrophilic groove delimited by a conserved hydrophobic residue. Based on these similarities, we propose that SecY originated as a YidC homolog which formed a channel by juxtaposing two hydrophilic grooves in an antiparallel homodimer. We find that archaeal YidC and its eukaryotic descendants use this same dimerisation interface to heterodimerise with a conserved partner. YidC's sufficiency for the function of simple cells is suggested by the results of reductive evolution in mitochondria and plastids, which tend to retain SecY only if they require translocation of large hydrophilic domains. CONCLUSIONS: SecY and YidC share previously unrecognised similarities in sequence, structure, mechanism, and function. Our delineation of a detailed correspondence between these two essential and ancient transporters enables a deeper mechanistic understanding of how each functions. Furthermore, key differences between them help explain how SecY performs its distinctive function in the recognition and translocation of secretory proteins. The unified theory presented here explains the evolution of these features, and thus reconstructs a key step in the origin of cells

    Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space

    Get PDF
    We present an analysis of 203 completed genomes in the Gene3D resource (including 17 eukaryotes), which demonstrates that the number of protein families is continually expanding over time and that singleton-sequences appear to be an intrinsic part of the genomes. A significant proportion of the proteomes can be assigned to fewer than 6000 well-characterized domain families with the remaining domain-like regions belonging to a much larger number of small uncharacterized families that are largely species specific. Our comprehensive domain annotation of 203 genomes enables us to provide more accurate estimates of the number of multi-domain proteins found in the three kingdoms of life than previous calculations. We find that 67% of eukaryotic sequences are multi-domain compared with 56% of sequences in prokaryotes. By measuring the domain coverage of genome sequences, we show that the structural genomics initiatives should aim to provide structures for less than a thousand structurally uncharacterized Pfam families to achieve reasonable structural annotation of the genomes. However, in large families, additional structures should be determined as these would reveal more about the evolution of the family and enable a greater understanding of how function evolves

    Transmembrane protein structure prediction using machine learning

    Get PDF
    This thesis describes the development and application of machine learning-based methods for the prediction of alpha-helical transmembrane protein structure from sequence alone. It is divided into six chapters. Chapter 1 provides an introduction to membrane structure and dynamics, membrane protein classes and families, and membrane protein structure prediction. Chapter 2 describes a topological study of the transmembrane protein CLN3 using a consensus of bioinformatic approaches constrained by experimental data. Mutations in CLN3 can cause juvenile neuronal ceroid lipofuscinosis, or Batten disease, an inherited neurodegenerative lysosomal storage disease affecting children, therefore such studies are important for directing further experimental work into this incurable illness. Chapter 3 explores the possibility of using biologically meaningful signatures described as regular expressions to influence the assignment of inside and outside loop locations during transmembrane topology prediction. Using this approach, it was possilbe to modify a recent topology prediction method leading to an improvement of 6% prediction accuracy using a standard data set. Chapter 4 describes the development of a novel support vector machine-based topology predictor that integrates both signal peptide and re-entrant helix prediction, benchmarked with full cross-validation on a novel data set of sequences with known crystal structures. The method achieves state-of-the-art performance in predicting topology and discriminating between globular and transmembrane proteins. We also present the results of applying these tools to a number of complete genomes. Chapter 5 describes a novel approach to predict lipid exposure, residue contacts, helix-helix interactions and finally the optimal helical packing arrangement of transmembrane proteins. It is based on two support vector machine classifiers that predict per residue lipid exposure and residue contacts, which are used to determine helix-helix interaction with up to 65% accuracy. The method is also able to discriminate native from decoy helical packing arrangements with up to 70% accuracy. Finally, a force-directed algorithm is employed to construct the optimal helical packing arrangement which demonstrates success for proteins containing up to 13 transmembrane helices. The final chapter summarises the major contributions of this thesis to biology, before future perspectives for TM protein structure prediction are discussed
    corecore