22,982 research outputs found
On The Stability of Interpretable Models
Interpretable classification models are built with the purpose of providing a
comprehensible description of the decision logic to an external oversight
agent. When considered in isolation, a decision tree, a set of classification
rules, or a linear model, are widely recognized as human-interpretable.
However, such models are generated as part of a larger analytical process. Bias
in data collection and preparation, or in model's construction may severely
affect the accountability of the design process. We conduct an experimental
study of the stability of interpretable models with respect to feature
selection, instance selection, and model selection. Our conclusions should
raise awareness and attention of the scientific community on the need of a
stability impact assessment of interpretable models
Fitting Prediction Rule Ensembles with R Package pre
Prediction rule ensembles (PREs) are sparse collections of rules, offering
highly interpretable regression and classification models. This paper presents
the R package pre, which derives PREs through the methodology of Friedman and
Popescu (2008). The implementation and functionality of package pre is
described and illustrated through application on a dataset on the prediction of
depression. Furthermore, accuracy and sparsity of PREs is compared with that of
single trees, random forest and lasso regression in four benchmark datasets.
Results indicate that pre derives ensembles with predictive accuracy comparable
to that of random forests, while using a smaller number of variables for
prediction
Classification hardness for supervised learners on 20 years of intrusion detection data
This article consolidates analysis of established (NSL-KDD) and new intrusion detection datasets (ISCXIDS2012, CICIDS2017, CICIDS2018) through the use of supervised machine learning (ML) algorithms. The uniformity in analysis procedure opens up the option to compare the obtained results. It also provides a stronger foundation for the conclusions about the efficacy of supervised learners on the main classification task in network security. This research is motivated in part to address the lack of adoption of these modern datasets. Starting with a broad scope that includes classification by algorithms from different families on both established and new datasets has been done to expand the existing foundation and reveal the most opportune avenues for further inquiry. After obtaining baseline results, the classification task was increased in difficulty, by reducing the available data to learn from, both horizontally and vertically. The data reduction has been included as a stress-test to verify if the very high baseline results hold up under increasingly harsh constraints. Ultimately, this work contains the most comprehensive set of results on the topic of intrusion detection through supervised machine learning. Researchers working on algorithmic improvements can compare their results to this collection, knowing that all results reported here were gathered through a uniform framework. This work's main contributions are the outstanding classification results on the current state of the art datasets for intrusion detection and the conclusion that these methods show remarkable resilience in classification performance even when aggressively reducing the amount of data to learn from
Efficient algorithms for decision tree cross-validation
Cross-validation is a useful and generally applicable technique often
employed in machine learning, including decision tree induction. An important
disadvantage of straightforward implementation of the technique is its
computational overhead. In this paper we show that, for decision trees, the
computational overhead of cross-validation can be reduced significantly by
integrating the cross-validation with the normal decision tree induction
process. We discuss how existing decision tree algorithms can be adapted to
this aim, and provide an analysis of the speedups these adaptations may yield.
The analysis is supported by experimental results.Comment: 9 pages, 6 figures.
http://www.cs.kuleuven.ac.be/cgi-bin-dtai/publ_info.pl?id=3478
- …