10,662 research outputs found
End-to-end Feature Selection Approach for Learning Skinny Trees
Joint feature selection and tree ensemble learning is a challenging task.
Popular tree ensemble toolkits e.g., Gradient Boosted Trees and Random Forests
support feature selection post-training based on feature importances, which are
known to be misleading, and can significantly hurt performance. We propose
Skinny Trees: a toolkit for feature selection in tree ensembles, such that
feature selection and tree ensemble learning occurs simultaneously. It is based
on an end-to-end optimization approach that considers feature selection in
differentiable trees with Group regularization. We optimize
with a first-order proximal method and present convergence guarantees for a
non-convex and non-smooth objective. Interestingly, dense-to-sparse
regularization scheduling can lead to more expressive and sparser tree
ensembles than vanilla proximal method. On 15 synthetic and real-world
datasets, Skinny Trees can achieve - feature
compression rates, leading up to faster inference over dense trees,
without any loss in performance. Skinny Trees lead to superior feature
selection than many existing toolkits e.g., in terms of AUC performance for
feature budget, Skinny Trees outperforms LightGBM by (up to
), and Random Forests by (up to ).Comment: Preprin
GENESIM : genetic extraction of a single, interpretable model
Models obtained by decision tree induction techniques excel in being
interpretable.However, they can be prone to overfitting, which results in a low
predictive performance. Ensemble techniques are able to achieve a higher
accuracy. However, this comes at a cost of losing interpretability of the
resulting model. This makes ensemble techniques impractical in applications
where decision support, instead of decision making, is crucial.
To bridge this gap, we present the GENESIM algorithm that transforms an
ensemble of decision trees to a single decision tree with an enhanced
predictive performance by using a genetic algorithm. We compared GENESIM to
prevalent decision tree induction and ensemble techniques using twelve publicly
available data sets. The results show that GENESIM achieves a better predictive
performance on most of these data sets than decision tree induction techniques
and a predictive performance in the same order of magnitude as the ensemble
techniques. Moreover, the resulting model of GENESIM has a very low complexity,
making it very interpretable, in contrast to ensemble techniques.Comment: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in
Complex System
TSE-IDS: A Two-Stage Classifier Ensemble for Intelligent Anomaly-based Intrusion Detection System
Intrusion detection systems (IDS) play a pivotal role in computer security by discovering and repealing malicious activities in computer networks. Anomaly-based IDS, in particular, rely on classification models trained using historical data to discover such malicious activities. In this paper, an improved IDS based on hybrid feature selection and two-level classifier ensembles is proposed. An hybrid feature selection technique comprising three methods, i.e. particle swarm optimization, ant colony algorithm, and genetic algorithm, is utilized to reduce the feature size of the training datasets (NSL-KDD and UNSW-NB15 are considered in this paper). Features are selected based on the classification performance of a reduced error pruning tree (REPT) classifier. Then, a two-level classifier ensembles based on two meta learners, i.e., rotation forest and bagging, is proposed. On the NSL-KDD dataset, the proposed classifier shows 85.8% accuracy, 86.8% sensitivity, and 88.0% detection rate, which remarkably outperform other classification techniques recently proposed in the literature. Results regarding the UNSW-NB15 dataset also improve the ones achieved by several state of the art techniques. Finally, to verify the results, a two-step statistical significance test is conducted. This is not usually considered by IDS research thus far and, therefore, adds value to the experimental results achieved by the proposed classifier
- …