21,394 research outputs found

    DeepSig: Deep learning improves signal peptide detection in proteins

    Get PDF
    Motivation: The identification of signal peptides in protein sequences is an important step toward protein localization and function characterization. Results: Here, we present DeepSig, an improved approach for signal peptide detection and cleavage-site prediction based on deep learning methods. Comparative benchmarks performed on an updated independent dataset of proteins show that DeepSig is the current best performing method, scoring better than other available state-of-the-art approaches on both signal peptide detection and precise cleavage-site identification. Availability and implementation: DeepSig is available as both standalone program and web server at https://deepsig.biocomp.unibo.it. All datasets used in this study can be obtained from the same website

    The posterior-Viterbi: a new decoding algorithm for hidden Markov models

    Full text link
    Background: Hidden Markov models (HMM) are powerful machine learning tools successfully applied to problems of computational Molecular Biology. In a predictive task, the HMM is endowed with a decoding algorithm in order to assign the most probable state path, and in turn the class labeling, to an unknown sequence. The Viterbi and the posterior decoding algorithms are the most common. The former is very efficient when one path dominates, while the latter, even though does not guarantee to preserve the automaton grammar, is more effective when several concurring paths have similar probabilities. A third good alternative is 1-best, which was shown to perform equal or better than Viterbi. Results: In this paper we introduce the posterior-Viterbi (PV) a new decoding which combines the posterior and Viterbi algorithms. PV is a two step process: first the posterior probability of each state is computed and then the best posterior allowed path through the model is evaluated by a Viterbi algorithm. Conclusions: We show that PV decoding performs better than other algorithms first on toy models and then on the computational biological problem of the prediction of the topology of beta-barrel membrane proteins.Comment: 23 pages, 3 figure

    PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide applications

    Full text link
    A cascading system of hierarchical, artificial neural networks (named PRED-CLASS) is presented for the generalized classification of proteins into four distinct classes-transmembrane, fibrous, globular, and mixed-from information solely encoded in their amino acid sequences. The architecture of the individual component networks is kept very simple, reducing the number of free parameters (network synaptic weights) for faster training, improved generalization, and the avoidance of data overfitting. Capturing information from as few as 50 protein sequences spread among the four target classes (6 transmembrane, 10 fibrous, 13 globular, and 17 mixed), PRED-CLASS was able to obtain 371 correct predictions out of a set of 387 proteins (success rate approximately 96%) unambiguously assigned into one of the target classes. The application of PRED-CLASS to several test sets and complete proteomes of several organisms demonstrates that such a method could serve as a valuable tool in the annotation of genomic open reading frames with no functional assignment or as a preliminary step in fold recognition and ab initio structure prediction methods. Detailed results obtained for various data sets and completed genomes, along with a web sever running the PRED-CLASS algorithm, can be accessed over the World Wide Web at http://o2.biol.uoa.gr/PRED-CLAS

    Systematic analysis of primary sequence domain segments for the discrimination between class C GPCR subtypes

    Get PDF
    G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.Peer ReviewedPostprint (author's final draft

    Structural analysis of the adenovirus type 2 E3/19K protein using mutagenesis and a panel of conformation-sensitive monoclonal antibodies

    Get PDF
    The E3/19K protein of human adenovirus type 2 (Ad2) was the first viral protein shown to interfere with antigen presentation. This 25 kDa transmembrane glycoprotein binds to major histocompatibility complex (MHC) class I molecules in the endoplasmic reticulum (ER), thereby preventing transport of newly synthesized peptide–MHC complexes to the cell surface and consequently T cell recognition. Recent data suggest that E3/19K also sequesters MHC class I like ligands intracellularly to suppress natural killer (NK) cell recognition. While the mechanism of ER retention is well understood, the structure of E3/19K remains elusive. To further dissect the structural and antigenic topography of E3/19K we carried out site-directed mutagenesis and raised monoclonal antibodies (mAbs) against a recombinant version of Ad2 E3/19K comprising the lumenal domain followed by a C-terminal histidine tag. Using peptide scanning, the epitopes of three mAbs were mapped to different regions of the lumenal domain, comprising amino acids 3–13, 15–21 and 41–45, respectively. Interestingly, mAb 3F4 reacted only weakly with wild-type E3/19K, but showed drastically increased binding to mutant E3/19K molecules, e.g. those with disrupted disulfide bonds, suggesting that 3F4 can sense unfolding of the protein. MAb 10A2 binds to an epitope apparently buried within E3/19K while that of 3A9 is exposed. Secondary structure prediction suggests that the lumenal domain contains six β-strands and an α-helix adjacent to the transmembrane domain. Interestingly, all mAbs bind to non-structured loops. Using a large panel of E3/19K mutants the structural alterations of the mutations were determined. With this knowledge the panel of mAbs will be valuable tools to further dissect structure/function relationships of E3/19K regarding down regulation of MHC class I and MHC class I like molecules and its effect on both T cell and NK cell recognition

    Comparison of exon 5 sequences from 35 class I genes of the BALB/c mouse

    Get PDF
    DNA sequences of the fifth exon, which encodes the transmembrane domain, were determined for the BALB/c mouse class I MHC genes and used to study the relationships between them. Based on nucleotide sequence similarity, the exon 5 sequences can be divided into seven groups. Although most members within each group are at least 80% similar to each other, comparison between groups reveals that the groups share little similarity. However, in spite of the extensive variation of the fifth exon sequences, analysis of their predicted amino acid translations reveals that only four class I gene fifth exons have frameshifts or stop codons that terminate their translation and prevent them from encoding a domain that is both hydrophobic and long enough to span a lipid bilayer. Exactly 27 of the remaining fifth exons could encode a domain that is similar to those of the transplantation antigens in that it consists of a proline-rich connecting peptide, a transmembrane segment, and a cytoplasmic portion with membrane-anchoring basic residues. The conservation of this motif in the majority of the fifth exon translations in spite of extensive variation suggests that selective pressure exists for these exons to maintain their ability to encode a functional transmembrane domain, raising the possibility that many of the nonclassical class I genes encode functionally important products

    Structural prediction and in silico physicochemical characterization for mouse caltrin I and bovine caltrin proteins

    Get PDF
    It is known that caltrin (calcium transport inhibitor) protein binds to sperm cells during ejaculation and inhibits extracellular Ca2+ uptake. Although the sequence and some biological features of mouse caltrin I and bovine caltrin are known, their physicochemical properties and tertiary structure are mainly unknown. We predicted the 3D structures of mouse caltrin I and bovine caltrin by molecular homology modeling and threading. Surface electrostatic potentials and electric fields were calculated using the Poisson–Boltzmann equation. Several different bioinformatics tools and available web servers were used to thoroughly analyze the physicochemical characteristics of both proteins, such as their Kyte and Doolittle hydropathy scores and helical wheel projections. The results presented in this work significantly aid further understanding of the molecular mechanisms of caltrin proteins modulating physiological processes associated with fertilization.Fil: Grasso, Ernesto Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Investigaciones Biológicas y Tecnológicas. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas, Físicas y Naturales. Instituto de Investigaciones Biológicas y Tecnológicas; ArgentinaFil: Sottile, Adolfo Emiliano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Investigaciones Biológicas y Tecnológicas. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas, Físicas y Naturales. Instituto de Investigaciones Biológicas y Tecnológicas; ArgentinaFil: Coronel, Carlos Enrique. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Investigaciones Biológicas y Tecnológicas. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas, Físicas y Naturales. Instituto de Investigaciones Biológicas y Tecnológicas; Argentin
    corecore