2,081,057 research outputs found
Improving feature selection algorithms using normalised feature histograms
The proposed feature selection method builds a histogram of the most stable
features from random subsets of a training set and ranks the features based on
a classifier based cross-validation. This approach reduces the instability of
features obtained by conventional feature selection methods that occur with
variation in training data and selection criteria. Classification results on
four microarray and three image datasets using three major feature selection
criteria and a naive Bayes classifier show considerable improvement over
benchmark results
Feature Selection Library (MATLAB Toolbox)
Feature Selection Library (FSLib) is a widely applicable MATLAB library for
Feature Selection (FS). FS is an essential component of machine learning and
data mining which has been studied for many years under many different
conditions and in diverse scenarios. These algorithms aim at ranking and
selecting a subset of relevant features according to their degrees of
relevance, preference, or importance as defined in a specific application.
Because feature selection can reduce the amount of features used for training
classification models, it alleviates the effect of the curse of dimensionality,
speeds up the learning process, improves model's performance, and enhances data
understanding. This short report provides an overview of the feature selection
algorithms included in the FSLib MATLAB toolbox among filter, embedded, and
wrappers methods.Comment: Feature Selection Library (FSLib) 201
Weighted Heuristic Ensemble of Filters
Feature selection has become increasingly important in data mining in recent years due to the rapid increase in the dimensionality of big data. However, the reliability and consistency of feature selection methods (filters) vary considerably on different data and no single filter performs consistently well under various conditions. Therefore, feature selection ensemble has been investigated recently to provide more reliable and effective results than any individual one but all the existing feature selection ensemble treat the feature selection methods equally regardless of their performance. In this paper, we present a novel framework which applies weighted feature selection ensemble through proposing a systemic way of adding different weights to the feature selection methods-filters. Also, we investigate how to determine the appropriate weight for each filter in an ensemble. Experiments based on ten benchmark datasets show that theoretically and intuitively adding more weight to ‘good filters’ should lead to better results but in reality it is very uncertain. This assumption was found to be correct for some examples in our experiment. However, for other situations, filters which had been assumed to perform well showed bad performance leading to even worse results. Therefore adding weight to filters might not achieve much in accuracy terms, in addition to increasing complexity, time consumption and clearly decreasing the stability
Online Unsupervised Multi-view Feature Selection
In the era of big data, it is becoming common to have data with multiple
modalities or coming from multiple sources, known as "multi-view data".
Multi-view data are usually unlabeled and come from high-dimensional spaces
(such as language vocabularies), unsupervised multi-view feature selection is
crucial to many applications. However, it is nontrivial due to the following
challenges. First, there are too many instances or the feature dimensionality
is too large. Thus, the data may not fit in memory. How to select useful
features with limited memory space? Second, how to select features from
streaming data and handles the concept drift? Third, how to leverage the
consistent and complementary information from different views to improve the
feature selection in the situation when the data are too big or come in as
streams? To the best of our knowledge, none of the previous works can solve all
the challenges simultaneously. In this paper, we propose an Online unsupervised
Multi-View Feature Selection, OMVFS, which deals with large-scale/streaming
multi-view data in an online fashion. OMVFS embeds unsupervised feature
selection into a clustering algorithm via NMF with sparse learning. It further
incorporates the graph regularization to preserve the local structure
information and help select discriminative features. Instead of storing all the
historical data, OMVFS processes the multi-view data chunk by chunk and
aggregates all the necessary information into several small matrices. By using
the buffering technique, the proposed OMVFS can reduce the computational and
storage cost while taking advantage of the structure information. Furthermore,
OMVFS can capture the concept drifts in the data streams. Extensive experiments
on four real-world datasets show the effectiveness and efficiency of the
proposed OMVFS method. More importantly, OMVFS is about 100 times faster than
the off-line methods
- …
