3,848 research outputs found
Rapid membrane protein topology prediction
Summary: State-of-the-art methods for topology of α-helical membrane proteins are based on the use of time-consuming multiple sequence alignments obtained from PSI-BLAST or other sources. Here, we examine if it is possible to use the consensus of topology prediction methods that are based on single sequences to obtain a similar accuracy as the more accurate multiple sequence-based methods. Here, we show that TOPCONS-single performs better than any of the other topology prediction methods tested here, but ∼6% worse than the best method that is utilizing multiple sequence alignments
Folding and insertion thermodynamics of the transmembrane WALP peptide
The anchor of most integral membrane proteins consists of one or several
helices spanning the lipid bilayer. The WALP peptide, GWW(LA)(L)WWA, is a
common model helix to study the fundamentals of protein insertion and folding,
as well as helix-helix association in the membrane. Its structural properties
have been illuminated in a large number of experimental and simulation studies.
In this combined coarse-grained and atomistic simulation study, we probe the
thermodynamics of a single WALP peptide, focusing on both the insertion across
the water-membrane interface, as well as folding in both water and a membrane.
The potential of mean force characterizing the peptide's insertion into the
membrane shows qualitatively similar behavior across peptides and three force
fields. However, the Martini force field exhibits a pronounced secondary
minimum for an adsorbed interfacial state, which may even become the global
minimum---in contrast to both atomistic simulations and the alternative PLUM
force field. Even though the two coarse-grained models reproduce the free
energy of insertion of individual amino acids side chains, they both
underestimate its corresponding value for the full peptide (as compared with
atomistic simulations), hinting at cooperative physics beyond the residue
level. Folding of WALP in the two environments indicates the helix as the most
stable structure, though with different relative stabilities and chain-length
dependence.Comment: 12 pages, 5 figure
Recommended from our members
Advances in structure and small molecule docking predictions for crystallized G-Protein coupled receptors
This dissertation discusses two main aspects of protein-ligand interaction for G-Protein coupled receptors: structure predictions of the flexible loop domains and docking into these receptors. The prediction of loop structure has been long worked on in the context of native, globular proteins. In this work it is extended to transmembrane proteins, which requires an explicit integration of the lipid bilayer into the loop prediction calculation. In the initial work, this new approach to loop prediction yields highly accurate 3-dimensional structures of the intra and intercellular loops of four G-protein coupled receptors--the A2A adenosine, bovine rhodopsin, β1 and β2 adronergic receptors. For these cases, the loops were predicted in the context of a completely native crystal structure. In subsequent work the approach was extended to work on perturbed cases, where all loops and tails were removed, and side chains near the loop being predicted were in nonnative conformations. Lastly, a full homology model of the β2 adronergic receptor was successfully built from the β1 adronegric receptor as its template. Work on docking into these receptors focuses on the kappa opioid receptor. Known antagonist binders are discriminated from a set of decoy nonbinders via docking calculations. Two new terms were added to the scoring function, WScore to achieve this, based on a detailed molecular understanding of how the receptor works
Structural approaches to protein sequence analysis
Various protein sequence analysis techniques are described, aimed at improving the prediction of protein structure by means of pattern matching. To investigate the possibility that improvements in amino acid comparison matrices could result in improvements in the sensitivity and accuracy of protein sequence alignments, a method for rapidly calculating amino acid mutation data matrices from large sequence data sets is presented. The method is then applied to the membrane-spanning segments of integral membrane proteins in order to investigate the nature of amino acid mutability in a lipid environment. Whilst purely sequence analytic techniques work well for cases where some residual sequence similarity remains between a newly characterized protein and a protein of known 3-D structure, in the harder cases, there is little or no sequence similarity with which to recognize proteins with similar folding patterns. In the light of these limitations, a new approach to protein fold recognition is described, which uses a statistically derived pairwise potential to evaluate the compatibility between a test sequence and a library of structural templates, derived from solved crystal structures. The method, which is called optimal sequence threading, proves to be highly successful, and is able to detect the common TIM barrel fold between a number of enzyme sequences, which has not been achieved by any previous sequence analysis technique. Finally, a new method for the prediction of the secondary structure and topology of membrane proteins is described. The method employs a set of statistical tables compiled from well-characterized membrane protein data, and a novel dynamic programming algorithm to recognize membrane topology models by expectation maximization. The statistical tables show definite biases towards certain amino acid species on the inside, middle and outside of a cellular membrane
Insertion and hairpin formation of membrane proteins: a Monte Carlo study
Some particular effects of a lipid membrane on the partitioning and the concomitant folding processes of model proteins have been investigated using Monte Carlo methods. It is observed that orientational order and lateral density fluctuations of the lipid matrix stabilize the orientation of helical proteins and induce a tendency of spontaneous formation of helical hairpins for helices longer than the width of the membrane. The lateral compression of the lipids on a hairpin leads to the extrusion of a loop at the trans side of the membrane. The stability of the hairpin can be increased by the design of appropriate groups of hydrophilic and hydrophobic residues at the extruded loop. It is shown that in the absence of lipids the orientation of proteins is not stable and the formation of hairpins is absent. Some analogies between the formation of helical hairpins in membranes and the formation of hairpins in polymer liquid crystals are discussed. The simulations indicate that the insertion process follows a well-defined pattern of kinetic steps
CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources
International audienceBACKGROUND: The functions of proteins are strongly related to their localization in cell compartments (for example the cytoplasm or membranes) but the experimental determination of the sub-cellular localization of proteomes is laborious and expensive. A fast and low-cost alternative approach is in silico prediction, based on features of the protein primary sequences. However, biologists are confronted with a very large number of computational tools that use different methods that address various localization features with diverse specificities and sensitivities. As a result, exploiting these computer resources to predict protein localization accurately involves querying all tools and comparing every prediction output; this is a painstaking task. Therefore, we developed a comprehensive database, called CoBaltDB, that gathers all prediction outputs concerning complete prokaryotic proteomes. DESCRIPTION: The current version of CoBaltDB integrates the results of 43 localization predictors for 784 complete bacterial and archaeal proteomes (2.548.292 proteins in total). CoBaltDB supplies a simple user-friendly interface for retrieving and exploring relevant information about predicted features (such as signal peptide cleavage sites and transmembrane segments). Data are organized into three work-sets ("specialized tools", "meta-tools" and "additional tools"). The database can be queried using the organism name, a locus tag or a list of locus tags and may be browsed using numerous graphical and text displays. CONCLUSIONS: With its new functionalities, CoBaltDB is a novel powerful platform that provides easy access to the results of multiple localization tools and support for predicting prokaryotic protein localizations with higher confidence than previously possible. CoBaltDB is available at http://www.umr6026.univ-rennes1.fr/english/home/research/basic/software/cobalten
A unified evolutionary origin for the ubiquitous protein transporters SecY and YidC.
BACKGROUND: Protein transporters translocate hydrophilic segments of polypeptide across hydrophobic cell membranes. Two protein transporters are ubiquitous and date back to the last universal common ancestor: SecY and YidC. SecY consists of two pseudosymmetric halves, which together form a membrane-spanning protein-conducting channel. YidC is an asymmetric molecule with a protein-conducting hydrophilic groove that partially spans the membrane. Although both transporters mediate insertion of membrane proteins with short translocated domains, only SecY transports secretory proteins and membrane proteins with long translocated domains. The evolutionary origins of these ancient and essential transporters are not known. RESULTS: The features conserved by the two halves of SecY indicate that their common ancestor was an antiparallel homodimeric channel. Structural searches with SecY's halves detect exceptional similarity with YidC homologs. The SecY halves and YidC share a fold comprising a three-helix bundle interrupted by a helical hairpin. In YidC, this hairpin is cytoplasmic and facilitates substrate delivery, whereas in SecY, it is transmembrane and forms the substrate-binding lateral gate helices. In both transporters, the three-helix bundle forms a protein-conducting hydrophilic groove delimited by a conserved hydrophobic residue. Based on these similarities, we propose that SecY originated as a YidC homolog which formed a channel by juxtaposing two hydrophilic grooves in an antiparallel homodimer. We find that archaeal YidC and its eukaryotic descendants use this same dimerisation interface to heterodimerise with a conserved partner. YidC's sufficiency for the function of simple cells is suggested by the results of reductive evolution in mitochondria and plastids, which tend to retain SecY only if they require translocation of large hydrophilic domains. CONCLUSIONS: SecY and YidC share previously unrecognised similarities in sequence, structure, mechanism, and function. Our delineation of a detailed correspondence between these two essential and ancient transporters enables a deeper mechanistic understanding of how each functions. Furthermore, key differences between them help explain how SecY performs its distinctive function in the recognition and translocation of secretory proteins. The unified theory presented here explains the evolution of these features, and thus reconstructs a key step in the origin of cells
Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space
We present an analysis of 203 completed genomes in the Gene3D resource (including 17 eukaryotes), which demonstrates that the number of protein families is continually expanding over time and that singleton-sequences appear to be an intrinsic part of the genomes. A significant proportion of the proteomes can be assigned to fewer than 6000 well-characterized domain families with the remaining domain-like regions belonging to a much larger number of small uncharacterized families that are largely species specific. Our comprehensive domain annotation of 203 genomes enables us to provide more accurate estimates of the number of multi-domain proteins found in the three kingdoms of life than previous calculations. We find that 67% of eukaryotic sequences are multi-domain compared with 56% of sequences in prokaryotes. By measuring the domain coverage of genome sequences, we show that the structural genomics initiatives should aim to provide structures for less than a thousand structurally uncharacterized Pfam families to achieve reasonable structural annotation of the genomes. However, in large families, additional structures should be determined as these would reveal more about the evolution of the family and enable a greater understanding of how function evolves
Transmembrane protein structure prediction using machine learning
This thesis describes the development and application of machine learning-based
methods for the prediction of alpha-helical transmembrane protein
structure from sequence alone. It is divided into six chapters.
Chapter 1 provides an introduction to membrane structure and dynamics,
membrane protein classes and families, and membrane protein structure prediction.
Chapter 2 describes a topological study of the transmembrane protein
CLN3 using a consensus of bioinformatic approaches constrained by experimental
data. Mutations in CLN3 can cause juvenile neuronal ceroid
lipofuscinosis, or Batten disease, an inherited neurodegenerative lysosomal
storage disease affecting children, therefore such studies are important
for directing further experimental work into this incurable illness.
Chapter 3 explores the possibility of using biologically meaningful signatures
described as regular expressions to influence the assignment of inside
and outside loop locations during transmembrane topology prediction. Using
this approach, it was possilbe to modify a recent topology prediction method
leading to an improvement of 6% prediction accuracy using a standard data set.
Chapter 4 describes the development of a novel support vector machine-based
topology predictor that integrates both signal peptide and re-entrant helix prediction,
benchmarked with full cross-validation on a novel data set of sequences with
known crystal structures. The method achieves state-of-the-art performance in predicting
topology and discriminating between globular and transmembrane proteins.
We also present the results of applying these tools to a number of complete genomes.
Chapter 5 describes a novel approach to predict lipid exposure, residue
contacts, helix-helix interactions and finally the optimal helical packing arrangement of transmembrane proteins. It is based on two support vector
machine classifiers that predict per residue lipid exposure and residue contacts,
which are used to determine helix-helix interaction with up to 65%
accuracy. The method is also able to discriminate native from decoy helical
packing arrangements with up to 70% accuracy. Finally, a force-directed
algorithm is employed to construct the optimal helical packing arrangement
which demonstrates success for proteins containing up to 13 transmembrane helices.
The final chapter summarises the major contributions of this thesis to biology,
before future perspectives for TM protein structure prediction are discussed
- …