23,874 research outputs found
Performance and optimization of support vector machines in high-energy physics classification problems
In this paper we promote the use of Support Vector Machines (SVM) as a
machine learning tool for searches in high-energy physics. As an example for a
new- physics search we discuss the popular case of Supersymmetry at the Large
Hadron Collider. We demonstrate that the SVM is a valuable tool and show that
an automated discovery- significance based optimization of the SVM
hyper-parameters is a highly efficient way to prepare an SVM for such
applications. A new C++ LIBSVM interface called SVM-HINT is developed and
available on Github.Comment: 20 pages, 6 figure
Interpretable multiclass classification by MDL-based rule lists
Interpretable classifiers have recently witnessed an increase in attention
from the data mining community because they are inherently easier to understand
and explain than their more complex counterparts. Examples of interpretable
classification models include decision trees, rule sets, and rule lists.
Learning such models often involves optimizing hyperparameters, which typically
requires substantial amounts of data and may result in relatively large models.
In this paper, we consider the problem of learning compact yet accurate
probabilistic rule lists for multiclass classification. Specifically, we
propose a novel formalization based on probabilistic rule lists and the minimum
description length (MDL) principle. This results in virtually parameter-free
model selection that naturally allows to trade-off model complexity with
goodness of fit, by which overfitting and the need for hyperparameter tuning
are effectively avoided. Finally, we introduce the Classy algorithm, which
greedily finds rule lists according to the proposed criterion. We empirically
demonstrate that Classy selects small probabilistic rule lists that outperform
state-of-the-art classifiers when it comes to the combination of predictive
performance and interpretability. We show that Classy is insensitive to its
only parameter, i.e., the candidate set, and that compression on the training
set correlates with classification performance, validating our MDL-based
selection criterion
Evading Classifiers by Morphing in the Dark
Learning-based systems have been shown to be vulnerable to evasion through
adversarial data manipulation. These attacks have been studied under
assumptions that the adversary has certain knowledge of either the target model
internals, its training dataset or at least classification scores it assigns to
input samples. In this paper, we investigate a much more constrained and
realistic attack scenario wherein the target classifier is minimally exposed to
the adversary, revealing on its final classification decision (e.g., reject or
accept an input sample). Moreover, the adversary can only manipulate malicious
samples using a blackbox morpher. That is, the adversary has to evade the
target classifier by morphing malicious samples "in the dark". We present a
scoring mechanism that can assign a real-value score which reflects evasion
progress to each sample based on the limited information available. Leveraging
on such scoring mechanism, we propose an evasion method -- EvadeHC -- and
evaluate it against two PDF malware detectors, namely PDFRate and Hidost. The
experimental evaluation demonstrates that the proposed evasion attacks are
effective, attaining evasion rate on the evaluation dataset.
Interestingly, EvadeHC outperforms the known classifier evasion technique that
operates based on classification scores output by the classifiers. Although our
evaluations are conducted on PDF malware classifier, the proposed approaches
are domain-agnostic and is of wider application to other learning-based
systems
- …