19 research outputs found
An Online Sparse Streaming Feature Selection Algorithm
Online streaming feature selection (OSFS), which conducts feature selection
in an online manner, plays an important role in dealing with high-dimensional
data. In many real applications such as intelligent healthcare platform,
streaming feature always has some missing data, which raises a crucial
challenge in conducting OSFS, i.e., how to establish the uncertain relationship
between sparse streaming features and labels. Unfortunately, existing OSFS
algorithms never consider such uncertain relationship. To fill this gap, we in
this paper propose an online sparse streaming feature selection with
uncertainty (OS2FSU) algorithm. OS2FSU consists of two main parts: 1) latent
factor analysis is utilized to pre-estimate the missing data in sparse
streaming features before con-ducting feature selection, and 2) fuzzy logic and
neighborhood rough set are employed to alleviate the uncertainty between
estimated streaming features and labels during conducting feature selection. In
the experiments, OS2FSU is compared with five state-of-the-art OSFS algorithms
on six real datasets. The results demonstrate that OS2FSU outperforms its
competitors when missing data are encountered in OSFS
Leveraging Model Inherent Variable Importance for Stable Online Feature Selection
Feature selection can be a crucial factor in obtaining robust and accurate
predictions. Online feature selection models, however, operate under
considerable restrictions; they need to efficiently extract salient input
features based on a bounded set of observations, while enabling robust and
accurate predictions. In this work, we introduce FIRES, a novel framework for
online feature selection. The proposed feature weighting mechanism leverages
the importance information inherent in the parameters of a predictive model. By
treating model parameters as random variables, we can penalize features with
high uncertainty and thus generate more stable feature sets. Our framework is
generic in that it leaves the choice of the underlying model to the user.
Strikingly, experiments suggest that the model complexity has only a minor
effect on the discriminative power and stability of the selected feature sets.
In fact, using a simple linear model, FIRES obtains feature sets that compete
with state-of-the-art methods, while dramatically reducing computation time. In
addition, experiments show that the proposed framework is clearly superior in
terms of feature selection stability.Comment: To be published in the Proceedings of the 26th ACM SIGKDD Conference
on Knowledge Discovery and Data Mining (KDD 2020