936 research outputs found

    The antigenic index: a novel algorithm for predicting antigenic determinants

    Get PDF
    In this paper, we introduce a computer algorithm which can be used to predict the topological features of a protein directly from its primary amino acid sequence. The computer program generates values for surface accessibility parameters and combines these values with those obtained for regional backbone flexibility and predicted secondary structure. The output of this algorithm, the antigenic index, is used to create a linear surface contour profile of the protein. Because most, if not all, antigenic sites are located within surface exposed regions of a protein, the program offers a reliable means of predicting potential antigenic determinants. We have tested the ability of this program to generate accurate surface contour profiles and predict antigenic sites from the linear amino acid sequences of well-characterized proteins and found a strong correlation between the predictions of the antigenic index and known structural and biological data

    EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation

    Full text link
    During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank has increased more than 15 fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence however is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D-convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The 2-layer architecture was investigated on a large dataset of 63,558 enzymes from the Protein Data Bank and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet.Comment: 11 pages, 6 figure

    Alignment of helical membrane protein sequences using AlignMe

    Get PDF
    Few sequence alignment methods have been designed specifically for integral membrane proteins, even though these important proteins have distinct evolutionary and structural properties that might affect their alignments. Existing approaches typically consider membrane-related information either by using membrane-specific substitution matrices or by assigning distinct penalties for gap creation in transmembrane and non-transmembrane regions. Here, we ask whether favoring matching of predicted transmembrane segments within a standard dynamic programming algorithm can improve the accuracy of pairwise membrane protein sequence alignments. We tested various strategies using a specifically designed program called AlignMe. An updated set of homologous membrane protein structures, called HOMEP2, was used as a reference for optimizing the gap penalties. The best of the membrane-protein optimized approaches were then tested on an independent reference set of membrane protein sequence alignments from the BAliBASE collection. When secondary structure (S) matching was combined with evolutionary information (using a position-specific substitution matrix (P)), in an approach we called AlignMePS, the resultant pairwise alignments were typically among the most accurate over a broad range of sequence similarities when compared to available methods. Matching transmembrane predictions (T), in addition to evolutionary information, and secondary-structure predictions, in an approach called AlignMePST, generally reduces the accuracy of the alignments of closely-related proteins in the BAliBASE set relative to AlignMePS, but may be useful in cases of extremely distantly related proteins for which sequence information is less informative. The open source AlignMe code is available at https://sourceforge.net/projects/alignme​/, and at http://www.forrestlab.org, along with an online server and the HOMEP2 data set

    Structural prediction and in silico physicochemical characterization for mouse caltrin I and bovine caltrin proteins

    Get PDF
    It is known that caltrin (calcium transport inhibitor) protein binds to sperm cells during ejaculation and inhibits extracellular Ca2+ uptake. Although the sequence and some biological features of mouse caltrin I and bovine caltrin are known, their physicochemical properties and tertiary structure are mainly unknown. We predicted the 3D structures of mouse caltrin I and bovine caltrin by molecular homology modeling and threading. Surface electrostatic potentials and electric fields were calculated using the Poisson–Boltzmann equation. Several different bioinformatics tools and available web servers were used to thoroughly analyze the physicochemical characteristics of both proteins, such as their Kyte and Doolittle hydropathy scores and helical wheel projections. The results presented in this work significantly aid further understanding of the molecular mechanisms of caltrin proteins modulating physiological processes associated with fertilization.Fil: Grasso, Ernesto Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Investigaciones Biológicas y Tecnológicas. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas, Físicas y Naturales. Instituto de Investigaciones Biológicas y Tecnológicas; ArgentinaFil: Sottile, Adolfo Emiliano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Investigaciones Biológicas y Tecnológicas. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas, Físicas y Naturales. Instituto de Investigaciones Biológicas y Tecnológicas; ArgentinaFil: Coronel, Carlos Enrique. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Investigaciones Biológicas y Tecnológicas. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas, Físicas y Naturales. Instituto de Investigaciones Biológicas y Tecnológicas; Argentin

    Improving protein order-disorder classification using charge-hydropathy plots

    Get PDF
    BACKGROUND: The earliest whole protein order/disorder predictor (Uversky et al., Proteins, 41: 415-427 (2000)), herein called the charge-hydropathy (C-H) plot, was originally developed using the Kyte-Doolittle (1982) hydropathy scale (Kyte & Doolittle., J. Mol. Biol, 157: 105-132(1982)). Here the goal is to determine whether the performance of the C-H plot in separating structured and disordered proteins can be improved by using an alternative hydropathy scale. RESULTS: Using the performance of the CH-plot as the metric, we compared 19 alternative hydropathy scales, with the finding that the Guy (1985) hydropathy scale (Guy, Biophys. J, 47:61-70(1985)) was the best of the tested hydropathy scales for separating large collections structured proteins and intrinsically disordered proteins (IDPs) on the C-H plot. Next, we developed a new scale, named IDP-Hydropathy, which further improves the discrimination between structured proteins and IDPs. Applying the C-H plot to a dataset containing 109 IDPs and 563 non-homologous fully structured proteins, the Kyte-Doolittle (1982) hydropathy scale, the Guy (1985) hydropathy scale, and the IDP-Hydropathy scale gave balanced two-state classification accuracies of 79%, 84%, and 90%, respectively, indicating a very substantial overall improvement is obtained by using different hydropathy scales. A correlation study shows that IDP-Hydropathy is strongly correlated with other hydropathy scales, thus suggesting that IDP-Hydropathy probably has only minor contributions from amino acid properties other than hydropathy. CONCLUSION: We suggest that IDP-Hydropathy would likely be the best scale to use for any type of algorithm developed to predict protein disorder

    Improving the prediction of secondary structure of -TIM-barrel— enzymes

    Get PDF
    The information contained in aligned sets of homologous protein sequences should improve the score of secondary structure prediction. Seven different enzymes having the (β/α)8 or TIM-barrel fold were used to optimize the prediction with regard to this class of enzymes. The α-helix, β-strand and loop propensities of the Garnier—Osguthorpe—Robson method were averaged at aligned residue positions, leading to a significant improvement over the average score obtained from single sequences. The increased accuracy correlates with the average sequence variability of the aligned set. Further improvements were obtained by using the following averaged properties as weights for the averaged state propensities: amphipathic moment and α-helix; hydropathy and β-strand; chain flexibility and loop. The clustering of conserved residues at the C-terminal ends of the 13-strands was used as an additional positive weight for β-strand propensity and increased the prediction of otherwise unpredicted β-strands decisively. The automatic weighted prediction method identifies >95% of the secondary structure elements of the set of seven TIM-barrel enzyme

    PiRaNhA: A server for the computational prediction of RNA-binding residues in protein sequences

    Get PDF
    The PiRaNhA web server is a publicly available online resource that automatically predicts the location of RNA-binding residues (RBRs) in protein sequences. The goal of functional annotation of sequences in the field of RNA binding is to provide predictions of high accuracy that require only small numbers of targeted mutations for verification. The PiRaNhA server uses a support vector machine (SVM), with position-specific scoring matrices, residue interface propensity, predicted residue accessibility and residue hydrophobicity as features. The server allows the submission of up to 10 protein sequences, and the predictions for each sequence are provided on a web page and via email. The prediction results are provided in sequence format with predicted RBRs highlighted, in text format with the SVM threshold score indicated and as a graph which enables users to quickly identify those residues above any specific SVM threshold. The graph effectively enables the increase or decrease of the false positive rate. When tested on a non-redundant data set of 42 protein sequences not used in training, the PiRaNhA server achieved an accuracy of 85%, specificity of 90% and a Matthews correlation coefficient of 0.41 and outperformed other publicly available servers. The PiRaNhA prediction server is freely available at http://www.bioinformatics.sussex.ac.uk/PIRANHA. © The Author(s) 2010. Published by Oxford University Press

    Software tools for simultaneous data visualization and T cell epitopes and disorder prediction in proteins

    Get PDF
    We have developed EpDis and MassPred, extendable open source software tools that support bioinformatic research and enable parallel use of different methods for the prediction of T cell epitopes, disorder and disordered binding regions and hydropathy calculation. These tools offer a semi-automated installation of chosen sets of external predictors and an interface allowing for easy application of the prediction methods, which can be applied either to individual proteins or to datasets of a large number of proteins. In addition to access to prediction methods, the tools also provide visualization of the obtained results, calculation of consensus from results of different methods, as well as import of experimental data and their comparison with results obtained with different predictors. The tools also offer a graphical user interface and the possibility to store data and the results obtained using all of the integrated methods in the relational database or flat file for further analysis. The MassPred part enables a massive parallel application of all integrated predictors to the set of proteins. Both tools can be downloaded from http://bioinfo.matf.bg.ac.rs/home/downloads.wafl?cat=Software. Appendix A includes the technical description of the created tools and a list of supported predictors

    Software tools for simultaneous data visualization and T cell epitopes and disorder prediction in proteins

    Get PDF
    We have developed EpDis and MassPred, extendable open source software tools that support bioinformatic research and enable parallel use of different methods for the prediction of T cell epitopes, disorder and disordered binding regions and hydropathy calculation. These tools offer a semi-automated installation of chosen sets of external predictors and an interface allowing for easy application of the prediction methods, which can be applied either to individual proteins or to datasets of a large number of proteins. In addition to access to prediction methods, the tools also provide visualization of the obtained results, calculation of consensus from results of different methods, as well as import of experimental data and their comparison with results obtained with different predictors. The tools also offer a graphical user interface and the possibility to store data and the results obtained using all of the integrated methods in the relational database or flat file for further analysis. The MassPred part enables a massive parallel application of all integrated predictors to the set of proteins. Both tools can be downloaded from http://bioinfo.matf.bg.ac.rs/home/downloads.wafl?cat=Software. Appendix A includes the technical description of the created tools and a list of supported predictors

    Deriving a mutation index of carcinogenicity using protein structure and protein interfaces

    Get PDF
    With the advent of Next Generation Sequencing the identification of mutations in the genomes of healthy and diseased tissues has become commonplace. While much progress has been made to elucidate the aetiology of disease processes in cancer, the contributions to disease that many individual mutations make remain to be characterised and their downstream consequences on cancer phenotypes remain to be understood. Missense mutations commonly occur in cancers and their consequences remain challenging to predict. However, this knowledge is becoming more vital, for both assessing disease progression and for stratifying drug treatment regimes. Coupled with structural data, comprehensive genomic databases of mutations such as the 1000 Genomes project and COSMIC give an opportunity to investigate general principles of how cancer mutations disrupt proteins and their interactions at the molecular and network level. We describe a comprehensive comparison of cancer and neutral missense mutations; by combining features derived from structural and interface properties we have developed a carcinogenicity predictor, InCa (Index of Carcinogenicity). Upon comparison with other methods, we observe that InCa can predict mutations that might not be detected by other methods. We also discuss general limitations shared by all predictors that attempt to predict driver mutations and discuss how this could impact high-throughput predictions. A web interface to a server implementation is publicly available at http://inca.icr.ac.uk/
    • …
    corecore