1,673 research outputs found
Reevaluating Adversarial Examples in Natural Language
State-of-the-art attacks on NLP models lack a shared definition of a what
constitutes a successful attack. We distill ideas from past work into a unified
framework: a successful natural language adversarial example is a perturbation
that fools the model and follows some linguistic constraints. We then analyze
the outputs of two state-of-the-art synonym substitution attacks. We find that
their perturbations often do not preserve semantics, and 38% introduce
grammatical errors. Human surveys reveal that to successfully preserve
semantics, we need to significantly increase the minimum cosine similarities
between the embeddings of swapped words and between the sentence encodings of
original and perturbed sentences.With constraints adjusted to better preserve
semantics and grammaticality, the attack success rate drops by over 70
percentage points.Comment: 15 pages; 9 Tables; 5 Figure
Non-uniform Feature Sampling for Decision Tree Ensembles
We study the effectiveness of non-uniform randomized feature selection in
decision tree classification. We experimentally evaluate two feature selection
methodologies, based on information extracted from the provided dataset:
\emph{leverage scores-based} and \emph{norm-based} feature selection.
Experimental evaluation of the proposed feature selection techniques indicate
that such approaches might be more effective compared to naive uniform feature
selection and moreover having comparable performance to the random forest
algorithm [3]Comment: 7 pages, 7 figures, 1 tabl
Defending Black-box Classifiers by Bayesian Boundary Correction
Classifiers based on deep neural networks have been recently challenged by
Adversarial Attack, where the widely existing vulnerability has invoked the
research in defending them from potential threats. Given a vulnerable
classifier, existing defense methods are mostly white-box and often require
re-training the victim under modified loss functions/training regimes. While
the model/data/training specifics of the victim are usually unavailable to the
user, re-training is unappealing, if not impossible for reasons such as limited
computational resources. To this end, we propose a new black-box defense
framework. It can turn any pre-trained classifier into a resilient one with
little knowledge of the model specifics. This is achieved by new joint Bayesian
treatments on the clean data, the adversarial examples and the classifier, for
maximizing their joint probability. It is further equipped with a new
post-train strategy which keeps the victim intact. We name our framework
Bayesian Boundary Correction (BBC). BBC is a general and flexible framework
that can easily adapt to different data types. We instantiate BBC for image
classification and skeleton-based human activity recognition, for both static
and dynamic data. Exhaustive evaluation shows that BBC has superior robustness
and can enhance robustness without severely hurting the clean accuracy,
compared with existing defense methods.Comment: arXiv admin note: text overlap with arXiv:2203.0471
- …