31 research outputs found

    CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources

    Get PDF
    International audienceBACKGROUND: The functions of proteins are strongly related to their localization in cell compartments (for example the cytoplasm or membranes) but the experimental determination of the sub-cellular localization of proteomes is laborious and expensive. A fast and low-cost alternative approach is in silico prediction, based on features of the protein primary sequences. However, biologists are confronted with a very large number of computational tools that use different methods that address various localization features with diverse specificities and sensitivities. As a result, exploiting these computer resources to predict protein localization accurately involves querying all tools and comparing every prediction output; this is a painstaking task. Therefore, we developed a comprehensive database, called CoBaltDB, that gathers all prediction outputs concerning complete prokaryotic proteomes. DESCRIPTION: The current version of CoBaltDB integrates the results of 43 localization predictors for 784 complete bacterial and archaeal proteomes (2.548.292 proteins in total). CoBaltDB supplies a simple user-friendly interface for retrieving and exploring relevant information about predicted features (such as signal peptide cleavage sites and transmembrane segments). Data are organized into three work-sets ("specialized tools", "meta-tools" and "additional tools"). The database can be queried using the organism name, a locus tag or a list of locus tags and may be browsed using numerous graphical and text displays. CONCLUSIONS: With its new functionalities, CoBaltDB is a novel powerful platform that provides easy access to the results of multiple localization tools and support for predicting prokaryotic protein localizations with higher confidence than previously possible. CoBaltDB is available at http://www.umr6026.univ-rennes1.fr/english/home/research/basic/software/cobalten

    Transmembrane protein topology prediction using support vector machines

    Get PDF
    Background: Alpha-helical transmembrane (TM) proteins are involved in a wide range of important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion. Many are also prime drug targets, and it has been estimated that more than half of all drugs currently on the market target membrane proteins. However, due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under-represented in structural databases. In the absence of structural data, sequence-based prediction methods allow TM protein topology to be investigated.Results: We present a support vector machine-based (SVM) TM protein topology predictor that integrates both signal peptide and re-entrant helix prediction, benchmarked with full cross-validation on a novel data set of 131 sequences with known crystal structures. The method achieves topology prediction accuracy of 89%, while signal peptides and re-entrant helices are predicted with 93% and 44% accuracy respectively. An additional SVM trained to discriminate between globular and TM proteins detected zero false positives, with a low false negative rate of 0.4%. We present the results of applying these tools to a number of complete genomes. Source code, data sets and a web server are freely available from http://bioinf.cs.ucl.ac.uk/psipred/.Conclusion: The high accuracy of TM topology prediction which includes detection of both signal peptides and re-entrant helices, combined with the ability to effectively discriminate between TM and globular proteins, make this method ideally suited to whole genome annotation of alpha-helical transmembrane proteins

    Transmembrane protein structure prediction using machine learning

    Get PDF
    This thesis describes the development and application of machine learning-based methods for the prediction of alpha-helical transmembrane protein structure from sequence alone. It is divided into six chapters. Chapter 1 provides an introduction to membrane structure and dynamics, membrane protein classes and families, and membrane protein structure prediction. Chapter 2 describes a topological study of the transmembrane protein CLN3 using a consensus of bioinformatic approaches constrained by experimental data. Mutations in CLN3 can cause juvenile neuronal ceroid lipofuscinosis, or Batten disease, an inherited neurodegenerative lysosomal storage disease affecting children, therefore such studies are important for directing further experimental work into this incurable illness. Chapter 3 explores the possibility of using biologically meaningful signatures described as regular expressions to influence the assignment of inside and outside loop locations during transmembrane topology prediction. Using this approach, it was possilbe to modify a recent topology prediction method leading to an improvement of 6% prediction accuracy using a standard data set. Chapter 4 describes the development of a novel support vector machine-based topology predictor that integrates both signal peptide and re-entrant helix prediction, benchmarked with full cross-validation on a novel data set of sequences with known crystal structures. The method achieves state-of-the-art performance in predicting topology and discriminating between globular and transmembrane proteins. We also present the results of applying these tools to a number of complete genomes. Chapter 5 describes a novel approach to predict lipid exposure, residue contacts, helix-helix interactions and finally the optimal helical packing arrangement of transmembrane proteins. It is based on two support vector machine classifiers that predict per residue lipid exposure and residue contacts, which are used to determine helix-helix interaction with up to 65% accuracy. The method is also able to discriminate native from decoy helical packing arrangements with up to 70% accuracy. Finally, a force-directed algorithm is employed to construct the optimal helical packing arrangement which demonstrates success for proteins containing up to 13 transmembrane helices. The final chapter summarises the major contributions of this thesis to biology, before future perspectives for TM protein structure prediction are discussed

    OMPdb: a database of β-barrel outer membrane proteins from Gram-negative bacteria

    Get PDF
    We describe here OMPdb, which is currently the most complete and comprehensive collection of integral β-barrel outer membrane proteins from Gram-negative bacteria. The database currently contains 69 354 proteins, which are classified into 85 families, based mainly on structural and functional criteria. Although OMPdb follows the annotation scheme of Pfam, many of the families included in the database were not previously described or annotated in other publicly available databases. There are also cross-references to other databases, references to the literature and annotation for sequence features, like transmembrane segments and signal peptides. Furthermore, via the web interface, the user can not only browse the available data, but submit advanced text searches and run BLAST queries against the database protein sequences or domain searches against the collection of profile Hidden Markov Models that represent each family’s domain organization as well. The database is freely accessible for academic users at http://bioinformatics.biol.uoa.gr/OMPdb and we expect it to be useful for genome-wide analyses, comparative genomics as well as for providing training and test sets for predictive algorithms regarding transmembrane β-barrels

    Cascading classifier application for topology prediction of TMB proteins

    Get PDF
    This paper is concerned with the use of a cascading classifier for trans-membrane beta-barrel topology prediction analysis. Most of novel drug design requires the use of membrane proteins. Trans-membrane proteins have key roles such as active transport across the membrane and signal transduction among other functions. Given their key roles, understanding their structures mechanisms and regulation at the level of molecules with the use of computational modeling is essential. In the field of bioinformatics, many years have been spent on the trans-membrane protein structure prediction focusing on the alpha-helix membrane proteins. Technological developments have been increasingly utilized in order to understand in more details membrane protein function and structure. Various methodologies have been developed for the prediction of TMB proteins topology however the use of cascading classifier has not been fully explored. This research presents a novel approach for TMB topology prediction. The MATLAB computer simulation results show that the proposed methodology predicts transmembrane topologies with high accuracy for randomly selected proteins

    Machine learning applications for the topology prediction of transmembrane beta-barrel proteins

    Get PDF
    The research topic for this PhD thesis focuses on the topology prediction of beta-barrel transmembrane proteins. Transmembrane proteins adopt various conformations that are about the functions that they provide. The two most predominant classes are alpha-helix bundles and beta-barrel transmembrane proteins. Alpha-helix proteins are present in larger numbers than beta-barrel transmembrane proteins in structure databases. Therefore, there is a need to find computational tools that can predict and detect the structure of beta-barrel transmembrane proteins. Transmembrane proteins are used for active transport across the membrane or signal transduction. Knowing the importance of their roles, it becomes essential to understand the structures of the proteins. Transmembrane proteins are also a significant focus for new drug discovery. Transmembrane beta-barrel proteins play critical roles in the translocation machinery, pore formation, membrane anchoring, and ion exchange. In bioinformatics, many years of research have been spent on the topology prediction of transmembrane alpha-helices. The efforts to TMB (transmembrane beta-barrel) proteins topology prediction have been overshadowed, and the prediction accuracy could be improved with further research. Various methodologies have been developed in the past to predict TMB proteins topology. Methods developed in the literature that are available include turn identification, hydrophobicity profiles, rule-based prediction, HMM (Hidden Markov model), ANN (Artificial Neural Networks), radial basis function networks, or combinations of methods. The use of cascading classifier has never been fully explored. This research presents and evaluates approaches such as ANN (Artificial Neural Networks), KNN (K-Nearest Neighbors, SVM (Support Vector Machines), and a novel approach to TMB topology prediction with the use of a cascading classifier. Computer simulations have been implemented in MATLAB, and the results have been evaluated. Data were collected from various datasets and pre-processed for each machine learning technique. A deep neural network was built with an input layer, hidden layers, and an output. Optimisation of the cascading classifier was mainly obtained by optimising each machine learning algorithm used and by starting using the parameters that gave the best results for each machine learning algorithm. The cascading classifier results show that the proposed methodology predicts transmembrane beta-barrel proteins topologies with high accuracy for randomly selected proteins. Using the cascading classifier approach, the best overall accuracy is 76.3%, with a precision of 0.831 and recall or probability of detection of 0.799 for TMB topology prediction. The accuracy of 76.3% is achieved using a two-layers cascading classifier. By constructing and using various machine-learning frameworks, systems were developed to analyse the TMB topologies with significant robustness. We have presented several experimental findings that may be useful for future research. Using the cascading classifier, we used a novel approach for the topology prediction of TMB proteins

    Machine-learning methods for structure prediction of β-barrel membrane proteins

    Get PDF
    Different types of proteins exist with diverse functions that are essential for living organisms. An important class of proteins is represented by transmembrane proteins which are specifically designed to be inserted into biological membranes and devised to perform very important functions in the cell such as cell communication and active transport across the membrane. Transmembrane β-barrels (TMBBs) are a sub-class of membrane proteins largely under-represented in structure databases because of the extreme difficulty in experimental structure determination. For this reason, computational tools that are able to predict the structure of TMBBs are needed. In this thesis, two computational problems related to TMBBs were addressed: the detection of TMBBs in large datasets of proteins and the prediction of the topology of TMBB proteins. Firstly, a method for TMBB detection was presented based on a novel neural network framework for variable-length sequence classification. The proposed approach was validated on a non-redundant dataset of proteins. Furthermore, we carried-out genome-wide detection using the entire Escherichia coli proteome. In both experiments, the method significantly outperformed other existing state-of-the-art approaches, reaching very high PPV (92%) and MCC (0.82). Secondly, a method was also introduced for TMBB topology prediction. The proposed approach is based on grammatical modelling and probabilistic discriminative models for sequence data labeling. The method was evaluated using a newly generated dataset of 38 TMBB proteins obtained from high-resolution data in the PDB. Results have shown that the model is able to correctly predict topologies of 25 out of 38 protein chains in the dataset. When tested on previously released datasets, the performances of the proposed approach were measured as comparable or superior to the current state-of-the-art of TMBB topology prediction

    TranCEP: Predicting the substrate class of transmembrane transport proteins using compositional, evolutionary, and positional information

    Get PDF
    Transporters mediate the movement of compounds across the membranes that separate the cell from its environment and across the inner membranes surrounding cellular compartments. It is estimated that one third of a proteome consists of membrane proteins, and many of these are transport proteins. Given the increase in the number of genomes being sequenced, there is a need for computational tools that predict the substrates that are transported by the transmembrane transport proteins. In this paper, we present TranCEP, a predictor of the type of substrate transported by a transmembrane transport protein. TranCEP combines the traditional use of the amino acid composition of the protein, with evolutionary information captured in a multiple sequence alignment (MSA), and restriction to important positions of the alignment that play a role in determining the specificity of the protein. Our experimental results show that TranCEP significantly outperforms the state-of-the-art predictors. The results quantify the contribution made by each type of information used

    Sequence based methods for the prediction and analysis of the structural topology of transmembrane beta barrel proteins

    Get PDF
    Transmembrane proteins play a major role in the normal functioning of the cell. Many transmembrane proteins act as a drug target and hence are of utmost importance to the pharmaceutical industry. In spite of the significance of transmembrane proteins, relatively few transmembrane 3D structures are available due to experimental bottlenecks. Due to this, it is imperative to develop novel computational methods to elucidate the structure and function of these proteins. The two major classes of transmembrane proteins are helical membrane proteins and transmembrane beta barrel proteins. Relatively more 3D structures of helical membrane proteins have been experimentally determined and in general, the majority of computational methods in the realm of transmembrane proteins deal with helical membrane proteins. However, in the recent years there has been an increased interest in the development of computational methods for the transmembrane beta barrel proteins. In this study, I focus on the transmembrane beta barrel proteins. More specifically, I present here computational methods for the prediction of the exposure status of the residues in the membrane spanning region of the transmembrane beta barrel proteins. To the best of our knowledge, the exposure status prediction is a novel problem in the realm of transmembrane beta barrel proteins. The knowledge about the exposure status of the membrane spanning residues is then used to analyse the structural properties of transmembrane beta strands. The exposure status information is also employed to identify relevant physico-chemical properties that are statistically significantly different in the transmembrane beta strands at the oligomeric interfaces and the rest of the protein surface. A method for the prediction of the beta strands in the membrane spanning regions of putative transmembrane beta barrel proteins from protein sequence has also been developed. The computational method for strand prediction is novel in the respect that it also gives the exposure status information of the residues predicted to be in the predicted transmembrane beta strands. The two computational methods developed in this study have been made available as web services. In the future, the information about the exposure status of the residues in the transmembrane beta strands can be used to identify putative transmembrane beta barrels from proteomic data. The exposure status prediction can also be extended to predict the pore region of transmembrane beta barrel proteins from sequence, which could in turn be used in the function prediction of putative transmembrane beta barrels.Die Klasse der Transmembranproteine übernimmt eine Reihe wesentlicher Funktionen innerhalb der Zelle. Daher eignen sich viele dieser Proteine als Ziele für medizinische Wirkstoffe und sind daher von außerordentlichem Interesse für die Pharmaindustrie. Trotz ihrer Wichtigkeit wurden bislang nur wenige drei-dimensionale Strukturen von Membranproteinen erfasst, denn deren experimentelle Bestimmung hat sich als ausgesprochen schwierig herausgestellt. Aus diesem Grund erweist sich die Entwicklung von in silico Methoden zur de novo Vorhersage von Struktur und Funktion dieser Proteine von als notwendige Strategie. Die beiden wesentlichen Klassen von Transmembranproteinen unterteilt man, basierend auf ihren charakteristischen Sekundärstrukturen, in alpha-helikale Proteine und beta-Barrels. Erstere machen den größeren Anteil an experimentell bestimmten Strukturen aus, und auch die meisten bislang vorgestellten in silico Methoden konzentrieren sich auf die Modellierung solch alpha-helikaler Strukturen. In den vergangenen Jahren stieg daher das Interesse an Methoden zur Modellierung von transmembranen beta-Barrels. Die vorliegende Disseration beschäftigt sich vorrangig mit dieser Klasse von Transmembranproteinen, insbesondere präsentieren wir ein Verfahren zur Vorhersage der Exposition ("Exposure\u27;) zur Lipidschicht einzelner Residuen innerhalb der Transmembranregion von beta-Barrels. Diese Vorhersage der Exposition stellt bislang ein neuartiges Problem im Feld der beta-Barrels dar. Die daraus gewonnenen Informationen wurden zur Analyse der strukturellen Eigenschaften von Transmembranketten verwendet. Darüber hinaus können die Exposure-Daten zur Identifikation bedeutender physikochemischer Eigenschaften verwendet werden. Unsere Untersuchungen ergaben, dass zwischen transmembranen beta-strands an Oligomer-Interfaces und dem Rest der Proteinoberfläche statistisch signifikante Unterschiede bezüglich dieser Eigenschaften auftreten. Darüber hinaus stellen wir ein Verfahren zur sequenzbasierten Vorhersage von Transmembran-Residuen mutmaßlicher beta-Barrels vor, welches in Kombination mit der Vorhersage des Exposure-Status in dieser Form neuartig ist. Die beiden in dieser Studie vorgestellten Methoden sind online als Webdienste verfügbar. Basierend auf den Exposure-Vorhersagen von beta-Faltblättern ist es möglich, in künftigen Studien mutmaßliche transmembrane beta-Barrels aus Proteomdatenzu identifizieren

    Profiling patterns of interhelical associations in membrane proteins.

    Get PDF
    A novel set of methods has been developed to characterize polytopic membrane proteins at the topological, organellar and functional level, in order to reduce the existing functional gap in the membrane proteome. Firstly, a novel clustering tool was implemented, named PROCLASS, to facilitate the manual curation of large sets of proteins, in readiness for feature extraction. TMLOOP and TMLOOP writer were implemented to refine current topological models by predicting membrane dipping loops. TMLOOP applies weighted predictive rules in a collective motif method, to overcome the inherent limitations of single motif methods. The approach achieved 92.4% accuracy in sensitivity and 100% reliability in specificity and 1,392 topological models described in the Swiss-Prot database were refined. The subcellular location (TMLOCATE) and molecular function (TMFUN) prediction methods rely on the TMDEPTH feature extraction method along data mining techniques. TMDEPTH uses refined topological models and amino acid sequences to calculate pairs of residues located at a similar depth in the membrane. Evaluation of TMLOCATE showed a normalized accuracy of 75% in discriminating between proteins belonging to the main organelles. At a sequence similarity threshold of 40%, TMFLTN predicted main functional classes with a sensitivity of 64.1-71.4%) and 70% of the olfactory GPCRs were correctly predicted. At a sequence similarity threshold of 90%, main functional classes were predicted with a sensitivity of 75.6-92.8%) and class A GPCRs were sub-classified with a sensitivity of 84.5%>-92.9%. These results reflect a direct association between the spatial arrangement of residues in the transmembrane regions and the capacity for polytopic membrane proteins to carry out their functions. The developed methods have for the first time categorically shown that the transmembrane regions hold essential information associated with a wide range of functional properties such as filtering and gating processes, subcellular location and molecular function
    corecore