Search CORE

INRIA a CCSD electronic archive server

Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space

Author: Lee David
Maibaum Michael
Marsden Russell L.
Orengo Christine A.
Yeats Corin
Publication venue: Oxford University Press
Publication date: 15/02/2006
Field of study

We present an analysis of 203 completed genomes in the Gene3D resource (including 17 eukaryotes), which demonstrates that the number of protein families is continually expanding over time and that singleton-sequences appear to be an intrinsic part of the genomes. A significant proportion of the proteomes can be assigned to fewer than 6000 well-characterized domain families with the remaining domain-like regions belonging to a much larger number of small uncharacterized families that are largely species specific. Our comprehensive domain annotation of 203 genomes enables us to provide more accurate estimates of the number of multi-domain proteins found in the three kingdoms of life than previous calculations. We find that 67% of eukaryotic sequences are multi-domain compared with 56% of sequences in prokaryotes. By measuring the domain coverage of genome sequences, we show that the structural genomics initiatives should aim to provide structures for less than a thousand structurally uncharacterized Pfam families to achieve reasonable structural annotation of the genomes. However, in large families, additional structures should be determined as these would reveal more about the evolution of the family and enable a greater understanding of how function evolves

Springer - Publisher Connector

CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources

Author: Avner Stéphane
Barloy-Hubler Frédérique
Goudenège David
Lucchetti-Miganeh Céline
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

International audienceBACKGROUND: The functions of proteins are strongly related to their localization in cell compartments (for example the cytoplasm or membranes) but the experimental determination of the sub-cellular localization of proteomes is laborious and expensive. A fast and low-cost alternative approach is in silico prediction, based on features of the protein primary sequences. However, biologists are confronted with a very large number of computational tools that use different methods that address various localization features with diverse specificities and sensitivities. As a result, exploiting these computer resources to predict protein localization accurately involves querying all tools and comparing every prediction output; this is a painstaking task. Therefore, we developed a comprehensive database, called CoBaltDB, that gathers all prediction outputs concerning complete prokaryotic proteomes. DESCRIPTION: The current version of CoBaltDB integrates the results of 43 localization predictors for 784 complete bacterial and archaeal proteomes (2.548.292 proteins in total). CoBaltDB supplies a simple user-friendly interface for retrieving and exploring relevant information about predicted features (such as signal peptide cleavage sites and transmembrane segments). Data are organized into three work-sets ("specialized tools", "meta-tools" and "additional tools"). The database can be queried using the organism name, a locus tag or a list of locus tags and may be browsed using numerous graphical and text displays. CONCLUSIONS: With its new functionalities, CoBaltDB is a novel powerful platform that provides easy access to the results of multiple localization tools and support for predicting prokaryotic protein localizations with higher confidence than previously possible. CoBaltDB is available at http://www.umr6026.univ-rennes1.fr/english/home/research/basic/software/cobalten

HAL-Rennes 1

LOCATE: a mouse protein subcellular localization database

Author: Aturaliya Rajith N.
Carninci Piero
Davis Melissa J.
Fink J. Lynn
Hanson Kelly
Hayashizaki Yoshihide
Kai Chikatoshi
Kawai Jun
Teasdale Melvena S.
Teasdale Rohan D.
Zhang Fasheng
Publication venue: Oxford University Press
Publication date: 28/12/2005
Field of study

We present here LOCATE, a curated, web-accessible database that houses data describing the membrane organization and subcellular localization of proteins from the FANTOM3 Isoform Protein Sequence set. Membrane organization is predicted by the high-throughput, computational pipeline MemO. The subcellular locations of selected proteins from this set were determined by a high-throughput, immunofluorescence-based assay and by manually reviewing >1700 peer-reviewed publications. LOCATE represents the first effort to catalogue the experimentally verified subcellular location and membrane organization of mammalian proteins using a high-throughput approach and provides localization data for ∼40% of the mouse proteome. It is available at

CiteSeerX

University of Melbourne Institutional Repository

University of Queensland eSpace

Pattern of Amino Acid Substitutions in Transmembrane Domains of β-Barrel Membrane Proteins for Detecting Remote Homologs in Bacteria and Mitochondria

Author: David Jimenez-Morales
Jie Liang
Ying Xu
Publication venue: Public Library of Science
Publication date: 01/11/2011
Field of study

-barrel membrane proteins play an important role in controlling the exchange and transport of ions and organic molecules across bacterial and mitochondrial outer membranes. They are also major regulators of apoptosis and are important determinants of bacterial virulence. In contrast to -helical membrane proteins, their evolutionary pattern of residue substitutions has not been quantified, and there are no scoring matrices appropriate for their detection through sequence alignment. Using a Bayesian Monte Carlo estimator, we have calculated the instantaneous substitution rates of transmembrane domains of bacterial -barrel membrane proteins. The scoring matrices constructed from the estimated rates, called bbTM for -barrel Transmembrane Matrices, improve significantly the sensitivity in detecting homologs of -barrel membrane proteins, while avoiding erroneous selection of both soluble proteins and other membrane proteins of similar composition. The estimated evolutionary patterns are general and can detect -barrel membrane proteins very remote from those used for substitution rate estimation. Furthermore, despite the separation of 2–3 billion years since the proto-mitochondrion entered the proto-eukaryotic cell, mitochondria outer membrane proteins in eukaryotes can also be detected accurately using these scoring matrices derived from bacteria. This is consistent with the suggestion that there is no eukaryote-specific signals for translocation. With these matrices, remote homologs of -barrel membrane proteins with known structures can be reliably detected at genome scale, allowing construction of high quality structural models of their transmembrane domains, at the rate of 131 structures per template protein. The scoring matrices will be useful for identification, classification, and functional inference of membrane proteins from genome and metagenome sequencing projects. The estimated substitution pattern will also help to identify key elements important for the structural and functional integrity of -barrel membrane proteins, and will aid in the design of mutagenesis studies

Public Library of Science (PLOS)

Public Library of Science (PLOS)

Molecular models for the core components of the flagellar type-III secretion complex

Author: Beeby M
Matthews-Palmer TR
Taylor WR
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 19/09/2016
Field of study

We show that by using a combination of computational methods, consistent three-dimensional molecular models can be proposed for the core proteins of the type-III secretion system. We employed a variety of approaches to reconcile disparate, and sometimes inconsistent, data sources into a coherent picture that for most of the proteins indicated a unique solution to the constraints. The range of difficulty spanned from the trivial (FliQ) to the difficult (FlhA and FliP). The uncertainties encountered with FlhA were largely the result of the greater number of helix packing possibilities allowed in a large protein, however, for FliP, there remains an uncertainty in how to reconcile the large displacement predicted between its two main helical hairpins and their ability to sit together happily across the bacterial membrane. As there is still no high resolution structural information on any of these proteins, we hope our predicted models may be of some use in aiding the interpretation of electron microscope images and in rationalising mutation data and experiments

Spiral - Imperial College Digital Repository

Modeling and predicting all-α transmembrane proteins including helix–helix pairing

Author: Steyaert Jean-Marc
Waldispühl Jérôme
Publication venue: Elsevier B.V.
Publication date
Field of study

AbstractModeling and predicting the structure of proteins is one of the most important challenges of computational biology. Exact physical models are too complex to provide feasible prediction tools and other ab initio methods only use local and probabilistic information to fold a given sequence. We show in this paper that all-α transmembrane protein secondary and super-secondary structures can be modeled with a multi-tape S-attributed grammar. An efficient structure prediction algorithm using both local and global constraints is designed and evaluated. Comparison with existing methods shows that the prediction rates as well as the definition level are sensibly increased. Furthermore this approach can be generalized to more complex proteins

Elsevier - Publisher Connector

Protein Structure Networks

Author: Greene Lesley H.
Publication venue: ODU Digital Commons
Publication date: 01/01/2012
Field of study

The application of the field of network science to the scientific disciplines of structural biology and biochemistry, have yielded important new insights into the nature and determinants of protein structures, function, dynamics and the folding process. Advancements in further understanding protein relationships through network science have also reshaped the way we view the connectivity of proteins in the protein universe. The canonical hierarchical classification can now be visualized for example, as a protein fold continuum. This review will survey several key advances in the expanding area of research being conducted to study protein structures and folding using network approaches

Old Dominion University

Coarse Grained Molecular Dynamics Simulations of Transmembrane Protein-Lipid Systems

Author: Debertrand Michel
Hilbers Peter A.J.
Markvoort Albert J.
Spijker Peter
Vaidehi Nagarajan
van Hoof Bram
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/01/2010
Field of study

Many biological cellular processes occur at the micro- or millisecond time scale. With traditional all-atom molecular modeling techniques it is difficult to investigate the dynamics of long time scales or large systems, such as protein aggregation or activation. Coarse graining (CG) can be used to reduce the number of degrees of freedom in such a system, and reduce the computational complexity. In this paper the first version of a coarse grained model for transmembrane proteins is presented. This model differs from other coarse grained protein models due to the introduction of a novel angle potential as well as a hydrogen bonding potential. These new potentials are used to stabilize the backbone. The model has been validated by investigating the adaptation of the hydrophobic mismatch induced by the insertion of WALP-peptides into a lipid membrane, showing that the first step in the adaptation is an increase in the membrane thickness, followed by a tilting of the peptide

Repository TU/e

Pure OAI Repository

TCDB: the Transporter Classification Database for membrane transport protein analyses and information

Author: Barabote Ravi D.
Saier Milton H.
Tran Can V.
Publication venue: Oxford University Press
Publication date: 28/12/2005
Field of study

The Transporter Classification Database (TCDB) is a web accessible, curated, relational database containing sequence, classification, structural, functional and evolutionary information about transport systems from a variety of living organisms. TCDB is a curated repository for factual information compiled from >10 000 references, encompassing ∼3000 representative transporters and putative transporters, classified into >400 families. The transporter classification (TC) system is an International Union of Biochemistry and Molecular Biology approved system of nomenclature for transport protein classification. TCDB is freely accessible at . The web interface provides several different methods for accessing the data, including step-by-step access to hierarchical classification, direct search by sequence or TC number and full-text searching. The functional ontology that underlies the database structure facilitates powerful query searches that yield valuable data in a quick and easy way. The TCDB website also offers several tools specifically designed for analyzing the unique characteristics of transport proteins. TCDB not only provides curated information and a tool for classifying newly identified membrane proteins, but also serves as a genome transporter-annotation tool

CiteSeerX