Skip to main content
Article thumbnail
Location of Repository

Regularized F-Measure Maximization for Feature Selection and Classification

By Zhenqiu Liu, Ming Tan and Feng Jiang

Abstract

Receiver Operating Characteristic (ROC) analysis is a common tool for assessing the performance of various classifications. It gained much popularity in medical and other fields including biological markers and, diagnostic test. This is particularly due to the fact that in real-world problems misclassification costs are not known, and thus, ROC curve and related utility functions such as F-measure can be more meaningful performance measures. F-measure combines recall and precision into a global measure. In this paper, we propose a novel method through regularized F-measure maximization. The proposed method assigns different costs to positive and negative samples and does simultaneous feature selection and prediction with L1 penalty. This method is useful especially when data set is highly unbalanced, or the labels for negative (positive) samples are missing. Our experiments with the benchmark, methylation, and high dimensional microarray data show that the performance of proposed algorithm is better or equivalent compared with the other popular classifiers in limited experiments

Topics: Methodology Report
Publisher: Hindawi Publishing Corporation
OAI identifier: oai:pubmedcentral.nih.gov:2674633
Provided by: PubMed Central
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://www.pubmedcentral.nih.g... (external link)
  • Suggested articles

    Citations

    1. (2004). A comparison of cluster analysis methods using DNA methylation data,”
    2. (2004). An efficient boosting algorithm for combining preferences,”
    3. (2003). AUC optimization vs. error rate minimization,”
    4. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,”
    5. (2005). Evaluating technologies for classification and prediction in medicine,”
    6. (2002). Hierarchical clustering of lung cancer cell lines using DNA methylation markers,”
    7. (2007). Insights into latent class analysis of diagnostic test performance,” Biostatistics,v o l .8 ,n o .2 ,p p .
    8. (2005). Maximum expected F-measure training of logistic regression models,”
    9. (2006). New algorithms for optimizing multi-class classifiers via ROC surfaces,”
    10. (2004). Optimizing AUC with Support Vector Machine (SVM),”
    11. (2003). Optimizing Fmeasure with support vector machines,”
    12. (1967). Quasi-Newton methods and their application to function minimization,”
    13. (1996). Regression shrinkage and selection via the lasso,”
    14. (2001). Robust classification for imprecise environments,”
    15. (2006). The doubly regularized support vector machine,”
    16. (1997). The lasso method for variable selection in the Cox model,”
    17. (1995). The Nature of Statistical Learning Theory,
    18. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction,
    19. (2005). The use of receiver operating characteristic curves in biomedical informatics,”

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.