Location of Repository

A hierarchical multi-label classification ant colony algorithm for protein function prediction

By Fernando E.B. Otero, Alex A. Freitas and Colin G. Johnson

Abstract

This paper proposes a novel ant colony optimisation (ACO) algorithm tailored for the hierarchical multi-label classification problem of protein function prediction. This problem is a very active research field, given the large increase in the number of uncharacterised proteins available for analysis and the importance of determining their functions in order to improve the current biological knowledge. Since it is known that a protein can perform more than one function and many protein functional-definition schemes are organised in a hierarchical structure, the classification problem in this case is an instance of a hierarchical multi-label problem. In this type of problem, each example may belong to multiple class labels and class labels are organised in a hierarchical structure—either a tree or a directed acyclic graph structure. It presents a more complex problem than conventional flat classification, given that the classification algorithm has to take into account hierarchical relationships between class labels and be able to predict multiple class labels for the same example. The proposed ACO algorithm discovers an ordered list of hierarchical multi-label classification rules. It is evaluated on sixteen challenging bioinformatics data sets involving hundreds or thousands of class labels to be predicted and compared against state-of-the-art decision tree induction algorithms for hierarchical multi-label classification

Topics: QA76
Publisher: Springer
Year: 2010
DOI identifier: 10.1007/s12293-010-0045-4
OAI identifier: oai:kar.kent.ac.uk:30634

Suggested articles

Preview

Citations

  1. (2006). A
  2. (1989). A critical investigation of recall and precision as measures of retrieval system performance.
  3. (2009). A Hierarchical Classification Ant Colony Algorithm for Predicting Gene Ontology Terms. In:
  4. (2007). An experimental comparison of classification algorithms for the hierarchical prediction of protein function. In:
  5. (2008). An Extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all Pairwise Comparisons.
  6. (2008). Ant colony algorithms for data classification. In:
  7. (2004). Ant Colony Optimization.
  8. (1999). Building Hierarchical Classifiers Using Class Proximity. In:
  9. (1993). C4.5: Programs for Machine Learning.
  10. (2008). cAnt-Miner: an ant colony classification algorithm to cope with continuous attributes. In:
  11. (2007). Classification with ant colony optimization.
  12. (2008). Decision trees for hierarchical multi-label classification.
  13. (2009). Editorial to the first issue.
  14. (1999). Foundations of Statistical Natural Language Processing.
  15. (2002). Freitas A
  16. (1996). From data mining to knowledge discovery: an overview. In:
  17. (2005). Functional annotation of genes using hierarchical text categorization. In: BioLINK SIG: Linking Literature, Information and Knowledge for Biology
  18. (2006). Functional bioinformatics for Arabidopsis thailana.
  19. (2000). Gene ontology: tool for the unification of biology.
  20. (2009). Handling continuous attributes in ant colony classification algorithms. In:
  21. (2002). Hierarchical multi-classification. In:
  22. (2006). Hierarchical multi-label prediction of gene function.
  23. (2001). Hierarchical Text Classification and Evaluation. In:
  24. (2004). Incremental algorithms for hierarchical classification.
  25. (2006). Kernel-Based Learning of Hierarchical Multilabel Classification Models.
  26. (2003). Performance Measurement Framework for Hierarchical Text.
  27. (2007). Predicting Gene Ontology functions based on support vector machines and statistical significance estimation.
  28. (2003). Prediction of human protein function according to gene ontology categories.
  29. (2003). Sandvik A
  30. (1999). Simultaneous Prediction of Multiple Chemical Parameters of River Water Quality with TILDE. In:
  31. (2006). Statistical Comparisons of Classifiers over Multiple Data Sets.
  32. (2002). The class imbalance problem: a systematic study. Intelligent Data Analysis 6:429–450
  33. (2004). The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes.
  34. (2006). The relationship between precision-recall and roc curves. In:
  35. (1998). Top-down induction of clustering trees. In:

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.