108,328 research outputs found
A Feature Selection Method for Multivariate Performance Measures
Feature selection with specific multivariate performance measures is the key
to the success of many applications, such as image retrieval and text
classification. The existing feature selection methods are usually designed for
classification error. In this paper, we propose a generalized sparse
regularizer. Based on the proposed regularizer, we present a unified feature
selection framework for general loss functions. In particular, we study the
novel feature selection paradigm by optimizing multivariate performance
measures. The resultant formulation is a challenging problem for
high-dimensional data. Hence, a two-layer cutting plane algorithm is proposed
to solve this problem, and the convergence is presented. In addition, we adapt
the proposed method to optimize multivariate measures for multiple instance
learning problems. The analyses by comparing with the state-of-the-art feature
selection methods show that the proposed method is superior to others.
Extensive experiments on large-scale and high-dimensional real world datasets
show that the proposed method outperforms -SVM and SVM-RFE when choosing a
small subset of features, and achieves significantly improved performances over
SVM in terms of -score
Efficient Optimization of Performance Measures by Classifier Adaptation
In practical applications, machine learning algorithms are often needed to
learn classifiers that optimize domain specific performance measures.
Previously, the research has focused on learning the needed classifier in
isolation, yet learning nonlinear classifier for nonlinear and nonsmooth
performance measures is still hard. In this paper, rather than learning the
needed classifier by optimizing specific performance measure directly, we
circumvent this problem by proposing a novel two-step approach called as CAPO,
namely to first train nonlinear auxiliary classifiers with existing learning
methods, and then to adapt auxiliary classifiers for specific performance
measures. In the first step, auxiliary classifiers can be obtained efficiently
by taking off-the-shelf learning algorithms. For the second step, we show that
the classifier adaptation problem can be reduced to a quadratic program
problem, which is similar to linear SVMperf and can be efficiently solved. By
exploiting nonlinear auxiliary classifiers, CAPO can generate nonlinear
classifier which optimizes a large variety of performance measures including
all the performance measure based on the contingency table and AUC, whilst
keeping high computational efficiency. Empirical studies show that CAPO is
effective and of high computational efficiency, and even it is more efficient
than linear SVMperf.Comment: 30 pages, 5 figures, to appear in IEEE Transactions on Pattern
Analysis and Machine Intelligence, 201
Recommended from our members
A niching memetic algorithm for simultaneous clustering and feature selection
Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data
- …