Search CORE

14 research outputs found

A Feature Selection Method for Multivariate Performance Measures

Author: Mao Qi
Tsang Ivor W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Feature selection with specific multivariate performance measures is the key to the success of many applications, such as image retrieval and text classification. The existing feature selection methods are usually designed for classification error. In this paper, we propose a generalized sparse regularizer. Based on the proposed regularizer, we present a unified feature selection framework for general loss functions. In particular, we study the novel feature selection paradigm by optimizing multivariate performance measures. The resultant formulation is a challenging problem for high-dimensional data. Hence, a two-layer cutting plane algorithm is proposed to solve this problem, and the convergence is presented. In addition, we adapt the proposed method to optimize multivariate measures for multiple instance learning problems. The analyses by comparing with the state-of-the-art feature selection methods show that the proposed method is superior to others. Extensive experiments on large-scale and high-dimensional real world datasets show that the proposed method outperforms

l_1

-SVM and SVM-RFE when choosing a small subset of features, and achieves significantly improved performances over SVM

^{perf}

in terms of

F_1

-score

arXiv.org e-Print Archive

CiteSeerX

Crossref

OPUS - University of Technology Sydney

DR-NTU (Digital Repository of NTU)

Towards ultrahigh dimensional feature selection for big data

Author: Tan M
Tsang IW
Wang L
Publication venue
Publication date: 01/01/2014
Field of study

In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature selection on Big Data, and then reformulate it as a convex semi-infinite programming (SIP) problem. To address the SIP, we propose an eficient feature generating paradigm. Different from traditional gradient-based approaches that conduct optimization on all input features, the proposed paradigm iteratively activates a group of features, and solves a sequence of multiple kernel learning (MKL) subproblems. To further speed up the training, we propose to solve the MKL subproblems in their primal forms through a modified accelerated proximal gradient approach. Due to such optimization scheme, some eficient cache techniques are also developed. The feature generating paradigm is guaranteed to converge globally under mild conditions, and can achieve lower feature selection bias. Moreover, the proposed method can tackle two challenging tasks in feature selection: 1) group-based feature selection with complex structures, and 2) nonlinear feature selection with explicit feature mappings. Comprehensive experiments on a wide range of synthetic and real-world data sets of tens of million data points with O(1014) features demonstrate the competitive performance of the proposed method over state-of-the-art feature selection methods in terms of generalization performance and training eficiency. © 2014 Mingkui Tan, Ivor W. Tsang and Li Wang

OPUS - University of Technology Sydney

DR-NTU (Digital Repository of NTU)

Novel metrics for feature extraction stability in protein sequence classication

Author: Engelbert Mephu Nguifo
Mondher Maddouri
Saber Aridhi
Saidi Rabie
Publication venue: HAL CCSD
Publication date: 18/05/2011
Field of study

Feature extraction is an unavoidable task, especially in the critical step of preprocessing biological sequences. This step consists for example in transforming the biological sequences into vectors of motifs where each motif is a subsequence that can be seen as a property (or attribute) characterizing the sequence. Hence, we obtain an object-property table where objects are sequences and properties are motif extracted from sequences. This output can be used to apply standard machine learning tools to perform data mining tasks such as classification. Several previous works have described feature extraction methods for bio-sequence classification, but none of them discussed the robustness of these methods when perturbing the input data. In this work, we introduce the notion of stability of the generated motifs in order to study the robustness of motif extraction methods. We express this robustness in terms of the ability of the method to reveal any change occurring in the input data and also its ability to target the interesting motifs. We use these criteria to evaluate and experimentally compare four existing extraction methods for biological sequences

HAL Clermont Université

Hal-Diderot

Effective Evolutionary Multilabel Feature Selection under a Budget Constraint

Author: Dae-Won Kim
Jaesung Lee
Wangduk Seo
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2018
Field of study

Crossref

An Empirical Evaluation of Constrained Feature Selection

Author: Bach Jakob
Böhm Klemens
Schulz Katrin
Trittenbach Holger
Zoller Kolja
Publication venue: Springer Nature
Publication date: 17/08/2022
Field of study

While feature selection helps to get smaller and more understandable prediction models, most existing feature-selection techniques do not consider domain knowledge. One way to use domain knowledge is via constraints on sets of selected features. However, the impact of constraints, e.g., on the predictive quality of selected features, is currently unclear. This article is an empirical study that evaluates the impact of propositional and arithmetic constraints on filter feature selection. First, we systematically generate constraints from various types, using datasets from different domains. As expected, constraints tend to decrease the predictive quality of feature sets, but this effect is non-linear. So we observe feature sets both adhering to constraints and with high predictive quality. Second, we study a concrete setting in materials science. This part of our study sheds light on how one can analyze scientific hypotheses with the help of constraints

KITopen

Online feature selection for mining big data

Author: HOI Steven C. H.
JIN Rong
WANG Jialei
ZHAO Peilin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Ministry of Education, Singapore under its Academic Research Funding Tier

CiteSeerX

Crossref

Institutional Knowledge at Singapore Management University

DR-NTU (Digital Repository of NTU)

Online feature selection and its applications

Author: HOI Steven C. H.
JIN Rong
WANG Jialei
ZHAO Peilin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2014
Field of study

Crossref

Institutional Knowledge at Singapore Management University