26 research outputs found
Improving the Validity of Decision Trees as Explanations
In classification and forecasting with tabular data, one often utilizes
tree-based models. This can be competitive with deep neural networks on tabular
data [cf. Grinsztajn et al., NeurIPS 2022, arXiv:2207.08815] and, under some
conditions, explainable. The explainability depends on the depth of the tree
and the accuracy in each leaf of the tree. Here, we train a low-depth tree with
the objective of minimising the maximum misclassification error across each
leaf node, and then ``suspend'' further tree-based models (e.g., trees of
unlimited depth) from each leaf of the low-depth tree. The low-depth tree is
easily explainable, while the overall statistical performance of the combined
low-depth and suspended tree-based models improves upon decision trees of
unlimited depth trained using classical methods (e.g., CART) and is comparable
to state-of-the-art methods (e.g., well-tuned XGBoost)
Is ensemble classifier needed for steganalysis in high-dimensional feature spaces?
International audienceThe ensemble classifier, based on Fisher Linear Discriminant base learners, was introduced specifically for steganalysis of digital media, which currently uses high-dimensional feature spaces. Presently it is probably the most used method to design supervised classifier for steganalysis of digital images because of its good detection accuracy and small computational cost. It has been assumed by the community that the classifier implements a non-linear boundary through pooling binary decision of individual classifiers within the ensemble. This paper challenges this assumption by showing that linear classifier obtained by various regularizations of the FLD can perform equally well as the ensemble. Moreover it demonstrates that using state of the art solvers linear classifiers can be trained more efficiently and offer certain potential advantages over the original ensemble leading to much lower computational complexity than the ensemble classifier. All claims are supported experimentally on a wide spectrum of stego schemes operating in both the spatial and JPEG domains with a multitude of rich steganalysis feature sets
Using High-Dimensional Image Models to Perform Highly Undetectable Steganography
International audienceThis paper presents a complete methodology for designing practical and highly-undetectable stegosystems for real digital media. The main design principle is to minimize a suitably-defined distortion by means of efficient coding algorithm. The distortion is defined as a weighted difference of extended state-of-the-art feature vectors already used in steganalysis. This allows us to "preserve" the model used by steganalyst and thus be undetectable even for large payloads. This framework can be efficiently implemented even when the dimensionality of the feature set used by the embedder is larger than 10^{7}. The high dimensional model is necessary to avoid known security weaknesses. Although high-dimensional models might be problem in steganalysis, we explain, why they are acceptable in steganography. As an example, we introduce HUGO, a new embedding algorithm for spatial-domain digital images and we contrast its performance with LSB matching. On the BOWS2 image database and in contrast with LSB matching, HUGO allows the embedder to hide 7\times longer message with the same level of security level