34 research outputs found

    GOASVM: PROTEIN SUBCELLULAR LOCALIZATION PREDICTION BASED ON GENE ONTOLOGY ANNOTATION AND SVM

    No full text
    Protein subcellular localization is an essential step to annotate proteins and to design drugs. This paper proposes a functionaldomain based method—GOASVM—by making full use of Gene Ontology Annotation (GOA) database to predict the subcellular locations of proteins. GOASVM uses the accession number (AC) of a query protein and the accession numbers (ACs) of homologous proteins returned from PSI-BLAST as the query strings to search against the GOA database. The occurrences of a set of predefined GO terms are used to construct the GO vectors for classification by support vector machines (SVMs). The paper investigated two different approaches to constructing the GO vectors. Experimental results suggest that using the ACs of homologous proteins as the query strings can achieve an accuracy of 94.68%, which is significantly higher than all published results based on the same dataset. As a userfriendly web-server, GOASVM is freely accessible to the public a

    PROTEIN SUBCELLULAR LOCALIZATION PREDICTION BASED ON PROFILE ALIGNMENT AND GENE ONTOLOGY

    No full text
    The functions of proteins are closely related to their subcellular locations. Computational methods are required to replace the laborious and time-consuming experimental processes for proteomics research. This paper proposes combining homology-based profile alignment methods and functionaldomain based Gene Ontology (GO) methods to predict the subcellular locations of proteins. The feature vectors constructed by these two methods are recognized by support vector machine (SVM) classifiers, and their scores are fused to enhance classification performance. The paper also investigates different approaches to constructing the GO vectors based on the GO terms returned from InterProScan. The results demonstrate that the GO methods are comparable to profile-alignment methods and overshadow those based on amino-acid compositions. Also, the fusion of these two methods can outperform the individual methods. Index Terms — Protein subcellular localization; Gen

    Semantic Similarity over Gene Ontology for Multi-Label Protein Subcellular Localization

    No full text

    ADAPTIVE THRESHOLDING FOR MULTI-LABEL SVM CLASSIFICATION WITH APPLICATION TO PROTEIN SUBCELLULAR LOCALIZATION PREDICTION

    No full text
    Multi-label classification has received increasing attention in computational proteomics, especially in protein subcellular localization. Many existing multi-label protein predictors suffer from over-prediction because they use a fixed decision threshold to determine the number of labels to which a query protein should be assigned. To address this problem, this paper proposes an adaptive thresholding scheme for multi-label support vector machine (SVM) classifiers. Specifically, each one-vs-rest SVM has an adaptive threshold that is a fraction of the maximum score of the one-vs-rest SVMs in the classifier. Therefore, the number of class labels of the query protein depends on the confidence of the SVMs in the classification. This scheme is integrated into our recently proposed subcellular localization predictor that uses the frequency of occurrences of gene-ontology terms as feature vectors and one-vs-rest SVMs as classifiers. Experimental results on two recent datasets suggest that the scheme can effectively avoid both over-prediction and under-prediction, resulting in performance significantly better than other gene-ontology based subcellular localization predictors. Index Terms — Multi-label classification; Protein subcellula
    corecore