71 research outputs found
Algorithms for incorporating prior topological information in HMMs: application to transmembrane proteins
BACKGROUND: Hidden Markov Models (HMMs) have been extensively used in computational molecular biology, for modelling protein and nucleic acid sequences. In many applications, such as transmembrane protein topology prediction, the incorporation of limited amount of information regarding the topology, arising from biochemical experiments, has been proved a very useful strategy that increased remarkably the performance of even the top-scoring methods. However, no clear and formal explanation of the algorithms that retains the probabilistic interpretation of the models has been presented so far in the literature. RESULTS: We present here, a simple method that allows incorporation of prior topological information concerning the sequences at hand, while at the same time the HMMs retain their full probabilistic interpretation in terms of conditional probabilities. We present modifications to the standard Forward and Backward algorithms of HMMs and we also show explicitly, how reliable predictions may arise by these modifications, using all the algorithms currently available for decoding HMMs. A similar procedure may be used in the training procedure, aiming at optimizing the labels of the HMM's classes, especially in cases such as transmembrane proteins where the labels of the membrane-spanning segments are inherently misplaced. We present an application of this approach developing a method to predict the transmembrane regions of alpha-helical membrane proteins, trained on crystallographically solved data. We show that this method compares well against already established algorithms presented in the literature, and it is extremely useful in practical applications. CONCLUSION: The algorithms presented here, are easily implemented in any kind of a Hidden Markov Model, whereas the prediction method (HMM-TM) is freely available for academic users at , offering the most advanced decoding options currently available
Evaluation of methods for predicting the topology of β-barrel outer membrane proteins and a consensus prediction method
BACKGROUND: Prediction of the transmembrane strands and topology of β-barrel outer membrane proteins is of interest in current bioinformatics research. Several methods have been applied so far for this task, utilizing different algorithmic techniques and a number of freely available predictors exist. The methods can be grossly divided to those based on Hidden Markov Models (HMMs), on Neural Networks (NNs) and on Support Vector Machines (SVMs). In this work, we compare the different available methods for topology prediction of β-barrel outer membrane proteins. We evaluate their performance on a non-redundant dataset of 20 β-barrel outer membrane proteins of gram-negative bacteria, with structures known at atomic resolution. Also, we describe, for the first time, an effective way to combine the individual predictors, at will, to a single consensus prediction method. RESULTS: We assess the statistical significance of the performance of each prediction scheme and conclude that Hidden Markov Model based methods, HMM-B2TMR, ProfTMB and PRED-TMBB, are currently the best predictors, according to either the per-residue accuracy, the segments overlap measure (SOV) or the total number of proteins with correctly predicted topologies in the test set. Furthermore, we show that the available predictors perform better when only transmembrane β-barrel domains are used for prediction, rather than the precursor full-length sequences, even though the HMM-based predictors are not influenced significantly. The consensus prediction method performs significantly better than each individual available predictor, since it increases the accuracy up to 4% regarding SOV and up to 15% in correctly predicted topologies. CONCLUSIONS: The consensus prediction method described in this work, optimizes the predicted topology with a dynamic programming algorithm and is implemented in a web-based application freely available to non-commercial users at
OMPdb: a database of β-barrel outer membrane proteins from Gram-negative bacteria
We describe here OMPdb, which is currently the most complete and comprehensive collection of integral β-barrel outer membrane proteins from Gram-negative bacteria. The database currently contains 69 354 proteins, which are classified into 85 families, based mainly on structural and functional criteria. Although OMPdb follows the annotation scheme of Pfam, many of the families included in the database were not previously described or annotated in other publicly available databases. There are also cross-references to other databases, references to the literature and annotation for sequence features, like transmembrane segments and signal peptides. Furthermore, via the web interface, the user can not only browse the available data, but submit advanced text searches and run BLAST queries against the database protein sequences or domain searches against the collection of profile Hidden Markov Models that represent each family’s domain organization as well. The database is freely accessible for academic users at http://bioinformatics.biol.uoa.gr/OMPdb and we expect it to be useful for genome-wide analyses, comparative genomics as well as for providing training and test sets for predictive algorithms regarding transmembrane β-barrels
GeneViTo: Visualizing gene-product functional and structural features in genomic datasets
BACKGROUND: The availability of increasing amounts of sequence data from completely sequenced genomes boosts the development of new computational methods for automated genome annotation and comparative genomics. Therefore, there is a need for tools that facilitate the visualization of raw data and results produced by bioinformatics analysis, providing new means for interactive genome exploration. Visual inspection can be used as a basis to assess the quality of various analysis algorithms and to aid in-depth genomic studies. RESULTS: GeneViTo is a JAVA-based computer application that serves as a workbench for genome-wide analysis through visual interaction. The application deals with various experimental information concerning both DNA and protein sequences (derived from public sequence databases or proprietary data sources) and meta-data obtained by various prediction algorithms, classification schemes or user-defined features. Interaction with a Graphical User Interface (GUI) allows easy extraction of genomic and proteomic data referring to the sequence itself, sequence features, or general structural and functional features. Emphasis is laid on the potential comparison between annotation and prediction data in order to offer a supplement to the provided information, especially in cases of "poor" annotation, or an evaluation of available predictions. Moreover, desired information can be output in high quality JPEG image files for further elaboration and scientific use. A compilation of properly formatted GeneViTo input data for demonstration is available to interested readers for two completely sequenced prokaryotes, Chlamydia trachomatis and Methanococcus jannaschii. CONCLUSIONS: GeneViTo offers an inspectional view of genomic functional elements, concerning data stemming both from database annotation and analysis tools for an overall analysis of existing genomes. The application is compatible with Linux or Windows ME-2000-XP operating systems, provided that the appropriate Java Runtime Environment is already installed in the system
A Hidden Markov Model method, capable of predicting and discriminating β-barrel outer membrane proteins
BACKGROUND: Integral membrane proteins constitute about 20–30% of all proteins in the fully sequenced genomes. They come in two structural classes, the α-helical and the β-barrel membrane proteins, demonstrating different physicochemical characteristics, structure and localization. While transmembrane segment prediction for the α-helical integral membrane proteins appears to be an easy task nowadays, the same is much more difficult for the β-barrel membrane proteins. We developed a method, based on a Hidden Markov Model, capable of predicting the transmembrane β-strands of the outer membrane proteins of gram-negative bacteria, and discriminating those from water-soluble proteins in large datasets. The model is trained in a discriminative manner, aiming at maximizing the probability of correct predictions rather than the likelihood of the sequences. RESULTS: The training has been performed on a non-redundant database of 14 outer membrane proteins with structures known at atomic resolution; it has been tested with a jacknife procedure, yielding a per residue accuracy of 84.2% and a correlation coefficient of 0.72, whereas for the self-consistency test the per residue accuracy was 88.1% and the correlation coefficient 0.824. The total number of correctly predicted topologies is 10 out of 14 in the self-consistency test, and 9 out of 14 in the jacknife. Furthermore, the model is capable of discriminating outer membrane from water-soluble proteins in large-scale applications, with a success rate of 88.8% and 89.2% for the correct classification of outer membrane and water-soluble proteins respectively, the highest rates obtained in the literature. That test has been performed independently on a set of known outer membrane proteins with low sequence identity with each other and also with the proteins of the training set. CONCLUSION: Based on the above, we developed a strategy, that enabled us to screen the entire proteome of E. coli for outer membrane proteins. The results were satisfactory, thus the method presented here appears to be suitable for screening entire proteomes for the discovery of novel outer membrane proteins. A web interface available for non-commercial users is located at: , and it is the only freely available HMM-based predictor for β-barrel outer membrane protein topology
A database for G proteins and their interaction with GPCRs
BACKGROUND: G protein-coupled receptors (GPCRs) transduce signals from extracellular space into the cell, through their interaction with G proteins, which act as switches forming hetero-trimers composed of different subunits (α,β,γ). The α subunit of the G protein is responsible for the recognition of a given GPCR. Whereas specialised resources for GPCRs, and other groups of receptors, are already available, currently, there is no publicly available database focusing on G Proteins and containing information about their coupling specificity with their respective receptors. DESCRIPTION: gpDB is a publicly accessible G proteins/GPCRs relational database. Including species homologs, the database contains detailed information for 418 G protein monomers (272 Gα, 87 Gβ and 59 Gγ) and 2782 GPCRs sequences belonging to families with known coupling to G proteins. The GPCRs and the G proteins are classified according to a hierarchy of different classes, families and sub-families, based on extensive literature searchs. The main innovation besides the classification of both G proteins and GPCRs is the relational model of the database, describing the known coupling specificity of the GPCRs to their respective α subunit of G proteins, a unique feature not available in any other database. There is full sequence information with cross-references to publicly available databases, references to the literature concerning the coupling specificity and the dimerization of GPCRs and the user may submit advanced queries for text search. Furthermore, we provide a pattern search tool, an interface for running BLAST against the database and interconnectivity with PRED-TMR, PRED-GPCR and TMRPres2D. CONCLUSIONS: The database will be very useful, for both experimentalists and bioinformaticians, for the study of G protein/GPCR interactions and for future development of predictive algorithms. It is available for academics, via a web browser at the URL
cuticleDB: a relational database of Arthropod cuticular proteins
BACKGROUND: The insect exoskeleton or cuticle is a bi-partite composite of proteins and chitin that provides protective, skeletal and structural functions. Little information is available about the molecular structure of this important complex that exhibits a helicoidal architecture. Scores of sequences of cuticular proteins have been obtained from direct protein sequencing, from cDNAs, and from genomic analyses. Most of these cuticular protein sequences contain motifs found only in arthropod proteins. DESCRIPTION: cuticleDB is a relational database containing all structural proteins of Arthropod cuticle identified to date. Many come from direct sequencing of proteins isolated from cuticle and from sequences from cDNAs that share common features with these authentic cuticular proteins. It also includes proteins from the Drosophila melanogaster and the Anopheles gambiae genomes, that have been predicted to be cuticular proteins, based on a Pfam motif (PF00379) responsible for chitin binding in Arthropod cuticle. The total number of the database entries is 445: 370 derive from insects, 60 from Crustacea and 15 from Chelicerata. The database can be accessed from our web server at . CONCLUSIONS: CuticleDB was primarily designed to contain correct and full annotation of cuticular protein data. The database will be of help to future genome annotators. Users will be able to test hypotheses for the existence of known and also of yet unknown motifs in cuticular proteins. An analysis of motifs may contribute to understanding how proteins contribute to the physical properties of cuticle as well as to the precise nature of their interaction with chitin
The Human Plasma Membrane Peripherome: Visualization and Analysis of Interactions
A major part of membrane function is conducted by proteins, both integral and
peripheral. Peripheral membrane proteins temporarily adhere to biological
membranes, either to the lipid bilayer or to integral membrane proteins with
non-covalent interactions. The aim of this study was to construct and analyze
the interactions of the human plasma membrane peripheral proteins (peripherome
hereinafter). For this purpose, we collected a dataset of peripheral proteins
of the human plasma membrane. We also collected a dataset of experimentally
verified interactions for these proteins. The interaction network created from
this dataset has been visualized using Cytoscape. We grouped the proteins based
on their subcellular location and clustered them using the MCL algorithm in
order to detect functional modules. Moreover, functional and graph theory based
analyses have been performed to assess biological features of the network.
Interaction data with drug molecules show that ~10% of peripheral membrane
proteins are targets for approved drugs, suggesting their potential
implications in disease. In conclusion, we reveal novel features and properties
regarding the protein-protein interaction network created by peripheral
proteins of the human plasma membrane.Comment: 39 pages, 5 figures, 3 supplement figures, under review in BMR
Comparative genomics among dairy strains of Streptococcus thermophilus
Microorganisms like lactic acid bacteria are employed for the biotransformation of raw materials into fermented foods. Fermented foods have increased nutritional value and shelf life as well as improved organoleptic characteristics compared to the raw materials. Interestingly, there are several genera within lactic acid bacteria that are considered to be important for food fermentations including the Streptococcus genus. However, only Streptococcus thermophilus is used as a starter culture. Streptococcus thermophilus has been adapted to milk and dairy products through a reductive evolution process that has led to the loss of typical streptococcal pathogenictraits. In this work we present the comparative genomic analysis among the recently sequenced genome of S. thermophilus ACA-DC 29 is olated from yogurt and the existing seven complete genome sequences of S. thermophilus. Full chromosome alignments revealed a high degree of synteny among the different strains although strain specific differences could also be observed. The pangenome of the eight strains was comprised of approximately 2,300 genes. Concerning the ACA-DC 29 strain, the majority of genes was distributed in the core and the accessory genomes. We also identified a significant percentage of unique genes, i.e. approximately 250, involved in various biological processes. Further analysis of these unique genes revealed that several of them may have been acquired through horizontal gene transfer. We also predicted five potential antimicrobial peptides and two CRISPR systems, which may confer resistance against phages. Overall, our analysis provides useful insights into the technological potential of the ACA-DC 29 strain
Characterization of plasmid pSMA198 found in Streptococcus macedonicus ACA-DC 198 supports the relation of the species to the milk environment
Background: Streptococcus macedonicus is an intriguing streptococcal species whose most frequent source of isolation is fermented foods similarly to Streptococcus thermophilus. During the genome sequencing of S. macedonicus ACA-DC 198 a plasmid was identified.
Objectives: To analyse pSMA198, the first plasmid isolated from S. macedonicus and to shed light onto its acquisition path.
Methods: Similarity searches of nucleotide and protein sequences, comparative analysis of whole plasmid sequences and phylogenetic analysis were performed using the appropriate bioinformatics tools.
Methods: Based on the similarity profiles of the plasmid’s replication initiation protein (Rep) and its origin of replication (ori), pSMA198 belongs to the narrow host range pCI305/pWV02 family found primarily in lactococci and it is the first such plasmid to be reported in streptococci. Comparative analysis of the pSMA198 over its ori, origin of transfer (oriT) or entire length revealed a high degree of similarity with plasmids pSK11b, pVF22 and pIL5, respectively, all isolated from Lactococcus lactis strains from milk or milk products. Phylogenetic analysis of the pSMA198 Rep showed that the vast majority of closely related proteins derive from lactococcal dairy isolates.
Conclusions: Our findings demonstrate that S. macedonicus ACA-DC 198 acquired most probably plasmid pSMA198 from L. lactis during an ancestral genetic exchange event that took place in milk or dairy products. Based on our analysis we provide the first molecular and evolutionary evidence for the habituation of S. macedonicus to the dairy environment
- …