2,630 research outputs found

    CATH functional families predict functional sites in proteins

    Get PDF
    MOTIVATION: Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS: FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly-available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyse which structural and evolutionary features are most predictive for functional sites. AVAILABILITY: https://github.com/UCL/cath-funsite-predictor. CONTACT: [email protected]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Computational approaches to predict protein functional families and functional sites.

    Get PDF
    Understanding the mechanisms of protein function is indispensable for many biological applications, such as protein engineering and drug design. However, experimental annotations are sparse, and therefore, theoretical strategies are needed to fill the gap. Here, we present the latest developments in building functional subclassifications of protein superfamilies and using evolutionary conservation to detect functional determinants, for example, catalytic-, binding- and specificity-determining residues important for delineating the functional families. We also briefly review other features exploited for functional site detection and new machine learning strategies for combining multiple features

    ToxDL : deep learning using primary structure and domain embeddings for assessing protein toxicity

    Get PDF
    Motivation: Genetically engineering food crops involves introducing proteins from other species into crop plant species or modifying already existing proteins with gene editing techniques. In addition, newly synthesized proteins can be used as therapeutic protein drugs against diseases. For both research and safety regulation purposes, being able to assess the potential toxicity of newly introduced/synthesized proteins is of high importance. Results: In this study, we present ToxDL, a deep learning-based approach for in silico prediction of protein toxicity from sequence alone. ToxDL consists of (i) a module encompassing a convolutional neural network that has been designed to handle variable-length input sequences, (ii) a domain2vec module for generating protein domain embeddings and (iii) an output module that classifies proteins as toxic or non-toxic, using the outputs of the two aforementioned modules. Independent test results obtained for animal proteins and cross-species transferability results obtained for bacteria proteins indicate that ToxDL outperforms traditional homology-based approaches and state-of-the-art machine-learning techniques. Furthermore, through visualizations based on saliency maps, we are able to verify that the proposed network learns known toxic motifs. Moreover, the saliency maps allow for directed in silico modification of a sequence, thus making it possible to alter its predicted protein toxicity

    Motif kernel generated by genetic programming improves remote homology and fold detection

    Get PDF
    BACKGROUND: Protein remote homology detection is a central problem in computational biology. Most recent methods train support vector machines to discriminate between related and unrelated sequences and these studies have introduced several types of kernels. One successful approach is to base a kernel on shared occurrences of discrete sequence motifs. Still, many protein sequences fail to be classified correctly for a lack of a suitable set of motifs for these sequences. RESULTS: We introduce the GPkernel, which is a motif kernel based on discrete sequence motifs where the motifs are evolved using genetic programming. All proteins can be grouped according to evolutionary relations and structure, and the method uses this inherent structure to create groups of motifs that discriminate between different families of evolutionary origin. When tested on two SCOP benchmarks, the superfamily and fold recognition problems, the GPkernel gives significantly better results compared to related methods of remote homology detection. CONCLUSION: The GPkernel gives particularly good results on the more difficult fold recognition problem compared to the other methods. This is mainly because the method creates motif sets that describe similarities among subgroups of both the related and unrelated proteins. This rich set of motifs give a better description of the similarities and differences between different folds than do previous motif-based methods

    MalVac: Database of malarial vaccine candidates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The sequencing of genomes of the Plasmodium species causing malaria, offers immense opportunities to aid in the development of new therapeutics and vaccine candidates through Bioinformatics tools and resources.</p> <p>Methods</p> <p>The starting point of MalVac database is the collection of known vaccine candidates and a set of predicted vaccine candidates identified from the whole proteome sequences of Plasmodium species provided by PlasmoDb 5.4 release (31st October 2007). These predicted vaccine candidates are the adhesins and adhesin-like proteins from Plasmodium species, <it>Plasmodium falciparum</it>, <it>Plasmodium vivax </it>and <it>Plasmodium yoelii</it>. Subsequently, these protein sequences were analysed through 20 publicly available algorithms to obtain Orthologs, Paralogs, BetaWraps, TargetP, TMHMM, SignalP, CDDSearch, BLAST with Human Ref. Proteins, T-cell epitopes, B-cell epitopes, Discotopes, and allergen predictions. All of this information was collected and organized with the ORFids of the protein sequences as primary keys. This information is relevant from the view point of Reverse Vaccinology in facilitating decision making on the most probable choice for vaccine strategy.</p> <p>Results</p> <p>Detailed information on the patterning of the epitopes and other motifs of importance from the viewpoint of reverse vaccinology has been obtained on the most probable protein candidates for vaccine investigation from three major malarial species. Analysis data are available on 161 adhesin proteins from <it>P. falciparum</it>, 137 adhesin proteins from <it>P. vivax </it>and 34 adhesin proteins from <it>P. yoelii</it>. The results are displayed in convenient tabular format and a facility to export the entire data has been provided. The MalVac database is a "community resource". Users are encouraged to export data and further contribute by value addition. Value added data may be sent back to the community either through MalVac or PlasmoDB.</p> <p>Conclusion</p> <p>A web server MalVac for facilitation of the identification of probable vaccine candidates has been developed and can be freely accessed.</p

    The supporting-cell antigen: a receptor-like protein tyrosine phosphatase expressed in the sensory epithelia of the inner ear

    Get PDF
    After noise- or drug-induced hair-cell loss, the sensory epithelia of the avian inner ear can regenerate new hair cells. Few molecular markers are available for the supporting-cell precursors of the hair cells that regenerate, and little is known about the signaling mechanisms underlying this regenerative response. Hybridoma methodology was used to obtain a monoclonal antibody (mAb) that stains the apical surface of supporting cells in the sensory epithelia of the inner ear. The mAb recognizes the supporting-cell antigen (SCA), a protein that is also found on the apical surfaces of retinal Müller cells, renal tubule cells, and intestinal brush border cells. Expression screening and molecular cloning reveal that the SCA is a novel receptor-like protein tyrosine phosphatase (RPTP), sharing similarity with human density-enhanced phosphatase, an RPTP thought to have a role in the density-dependent arrest of cell growth. In response to hair-cell damage induced by noise in vivo or hair-cell loss caused by ototoxic drug treatment in vitro, some supporting cells show a dramatic decrease in SCA expression levels on their apical surface. This decrease occurs before supporting cells are known to first enter S-phase after trauma, indicating that it may be a primary rather than a secondary response to injury. These results indicate that the SCA is a signaling molecule that may influence the potential of nonsensory supporting cells to either proliferate or differentiate into hair cell
    • …
    corecore