72 research outputs found
An experimental study of the intrinsic stability of random forest variable importance measures
BACKGROUND: The stability of Variable Importance Measures (VIMs) based on random forest has recently received increased attention. Despite the extensive attention on traditional stability of data perturbations or parameter variations, few studies include influences coming from the intrinsic randomness in generating VIMs, i.e. bagging, randomization and permutation. To address these influences, in this paper we introduce a new concept of intrinsic stability of VIMs, which is defined as the self-consistence among feature rankings in repeated runs of VIMs without data perturbations and parameter variations. Two widely used VIMs, i.e., Mean Decrease Accuracy (MDA) and Mean Decrease Gini (MDG) are comprehensively investigated. The motivation of this study is two-fold. First, we empirically verify the prevalence of intrinsic stability of VIMs over many real-world datasets to highlight that the instability of VIMs does not originate exclusively from data perturbations or parameter variations, but also stems from the intrinsic randomness of VIMs. Second, through Spearman and Pearson tests we comprehensively investigate how different factors influence the intrinsic stability. RESULTS: The experiments are carried out on 19 benchmark datasets with diverse characteristics, including 10 high-dimensional and small-sample gene expression datasets. Experimental results demonstrate the prevalence of intrinsic stability of VIMs. Spearman and Pearson tests on the correlations between intrinsic stability and different factors show that #feature (number of features) and #sample (size of sample) have a coupling effect on the intrinsic stability. The synthetic indictor, #feature/#sample, shows both negative monotonic correlation and negative linear correlation with the intrinsic stability, while OOB accuracy has monotonic correlations with intrinsic stability. This indicates that high-dimensional, small-sample and high complexity datasets may suffer more from intrinsic instability of VIMs. Furthermore, with respect to parameter settings of random forest, a large number of trees is preferred. No significant correlations can be seen between intrinsic stability and other factors. Finally, the magnitude of intrinsic stability is always smaller than that of traditional stability. CONCLUSION: First, the prevalence of intrinsic stability of VIMs demonstrates that the instability of VIMs not only comes from data perturbations or parameter variations, but also stems from the intrinsic randomness of VIMs. This finding gives a better understanding of VIM stability, and may help reduce the instability of VIMs. Second, by investigating the potential factors of intrinsic stability, users would be more aware of the risks and hence more careful when using VIMs, especially on high-dimensional, small-sample and high complexity datasets
Recommended from our members
Can understanding reward help illuminate anhedonia?
Purpose of review: The goal of this paper is to examine how reward processing might help us understand the symptom of anhedonia.
Recent findings: There are extensive reviews exploring the relationship between responses to rewarding stimuli and depression. These often include a discussion on anhedonia and how this might be underpinned in particular by dysfunctional reward processing. However, there is no specific consensus on whether studies to date have adequately examined the various sub-components of reward processing or how these might relate in turn to various aspects of anhedonia symptoms.
Summary: The approach to understanding the symptom of anhedonia should be to examine all the sub-components of reward processing at the subjective and objective behavioural and neural level, with well validated tasks that can be replicated. Investigating real life experiences of anhedonia and how theses might be predicted by objective lab measures is also needed in future research
Space Charge at Nanoscale: Probing Injection and Dynamic Phenomena Under Dark/Light Configurations by Using KPFM and C-AFM
International audienc
Determinação do tempo térmico para o desenvolvimento de mudas de eucalipto na fase de enraizamento
High Resolution Sharp Computational Methods for Elliptic and Parabolic Problems in Complex Geometries
- …