Skip to main content
Article thumbnail
Location of Repository

Better prediction of protein cellular localization sites with the k nearest neighbors classifier

By Paul Horton and Kenta Nakai


We have compared four classifiers on the problem of predicting the cellular localization sites of proteins in yeast and E.coli. A set of sequence derived features, such as regions of high hydrophobicity, were used for each classifier. The methods compared were a structured probabilistic model specifically designed for the localization problem, the k nearest neighbors classitier, the binary decision tree classifier, and the naive Bayes classifier. The result of tests using stratified cross validation shows the k nearest neighbors classifier to perform better than the other methods. In the case of yeast this difference was statistically significant using a cross-validated paired t test. The result is an accuracy of approximately 60°/o for 10 yeast classes and 86 % for 8 E.coli classes. The best previously reported accuracies for these datasets were 55 % and 81% respectively

Topics: Protein Localization, k Nearest Neighbor Classifier, Classification, Yeast, E.colz
Year: 1997
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.