321,838 research outputs found
On The Stability of Interpretable Models
Interpretable classification models are built with the purpose of providing a
comprehensible description of the decision logic to an external oversight
agent. When considered in isolation, a decision tree, a set of classification
rules, or a linear model, are widely recognized as human-interpretable.
However, such models are generated as part of a larger analytical process. Bias
in data collection and preparation, or in model's construction may severely
affect the accountability of the design process. We conduct an experimental
study of the stability of interpretable models with respect to feature
selection, instance selection, and model selection. Our conclusions should
raise awareness and attention of the scientific community on the need of a
stability impact assessment of interpretable models
EFSIS: Ensemble Feature Selection Integrating Stability
Ensemble learning that can be used to combine the predictions from multiple
learners has been widely applied in pattern recognition, and has been reported
to be more robust and accurate than the individual learners. This ensemble
logic has recently also been more applied in feature selection. There are
basically two strategies for ensemble feature selection, namely data
perturbation and function perturbation. Data perturbation performs feature
selection on data subsets sampled from the original dataset and then selects
the features consistently ranked highly across those data subsets. This has
been found to improve both the stability of the selector and the prediction
accuracy for a classifier. Function perturbation frees the user from having to
decide on the most appropriate selector for any given situation and works by
aggregating multiple selectors. This has been found to maintain or improve
classification performance. Here we propose a framework, EFSIS, combining these
two strategies. Empirical results indicate that EFSIS gives both high
prediction accuracy and stability.Comment: 20 pages, 3 figure
Instance and feature weighted k-nearest-neighbors algorithm
We present a novel method that aims at providing a more stable selection of feature subsets when variations in the training process occur. This is accomplished by using an instance-weighting process -assigning different importances to instances as a preprocessing step to a feature weighting method that is independent of the learner, and then making good use of both sets of computed weigths in a standard Nearest-Neighbours classifier.
We report extensive experimentation in well-known benchmarking datasets as well as some challenging microarray
gene expression problems. Our results show increases in stability for most subset sizes and most problems, without
compromising prediction accuracy.Peer ReviewedPostprint (published version
Extensions of stability selection using subsamples of observations and covariates
We introduce extensions of stability selection, a method to stabilise
variable selection methods introduced by Meinshausen and B\"uhlmann (J R Stat
Soc 72:417-473, 2010). We propose to apply a base selection method repeatedly
to random observation subsamples and covariate subsets under scrutiny, and to
select covariates based on their selection frequency. We analyse the effects
and benefits of these extensions. Our analysis generalizes the theoretical
results of Meinshausen and B\"uhlmann (J R Stat Soc 72:417-473, 2010) from the
case of half-samples to subsamples of arbitrary size. We study, in a
theoretical manner, the effect of taking random covariate subsets using a
simplified score model. Finally we validate these extensions on numerical
experiments on both synthetic and real datasets, and compare the obtained
results in detail to the original stability selection method.Comment: accepted for publication in Statistics and Computin
Weighted Heuristic Ensemble of Filters
Feature selection has become increasingly important in data mining in recent years due to the rapid increase in the dimensionality of big data. However, the reliability and consistency of feature selection methods (filters) vary considerably on different data and no single filter performs consistently well under various conditions. Therefore, feature selection ensemble has been investigated recently to provide more reliable and effective results than any individual one but all the existing feature selection ensemble treat the feature selection methods equally regardless of their performance. In this paper, we present a novel framework which applies weighted feature selection ensemble through proposing a systemic way of adding different weights to the feature selection methods-filters. Also, we investigate how to determine the appropriate weight for each filter in an ensemble. Experiments based on ten benchmark datasets show that theoretically and intuitively adding more weight to ‘good filters’ should lead to better results but in reality it is very uncertain. This assumption was found to be correct for some examples in our experiment. However, for other situations, filters which had been assumed to perform well showed bad performance leading to even worse results. Therefore adding weight to filters might not achieve much in accuracy terms, in addition to increasing complexity, time consumption and clearly decreasing the stability
Stable Feature Selection for Biomarker Discovery
Feature selection techniques have been used as the workhorse in biomarker
discovery applications for a long time. Surprisingly, the stability of feature
selection with respect to sampling variations has long been under-considered.
It is only until recently that this issue has received more and more attention.
In this article, we review existing stable feature selection methods for
biomarker discovery using a generic hierarchal framework. We have two
objectives: (1) providing an overview on this new yet fast growing topic for a
convenient reference; (2) categorizing existing methods under an expandable
framework for future research and development
- …