151 research outputs found
Adversarial Attacks on Tables with Entity Swap
The capabilities of large language models (LLMs) have been successfully
applied in the context of table representation learning. The recently proposed
tabular language models have reported state-of-the-art results across various
tasks for table interpretation. However, a closer look into the datasets
commonly used for evaluation reveals an entity leakage from the train set into
the test set. Motivated by this observation, we explore adversarial attacks
that represent a more realistic inference setup. Adversarial attacks on text
have been shown to greatly affect the performance of LLMs, but currently, there
are no attacks targeting tabular language models. In this paper, we propose an
evasive entity-swap attack for the column type annotation (CTA) task. Our CTA
attack is the first black-box attack on tables, where we employ a
similarity-based sampling strategy to generate adversarial examples. The
experimental results show that the proposed attack generates up to a 70% drop
in performance.Comment: Accepted at TaDA workshop at VLDB 202
Adaptive Adversarial Training Does Not Increase Recourse Costs
Recent work has connected adversarial attack methods and algorithmic recourse
methods: both seek minimal changes to an input instance which alter a model's
classification decision. It has been shown that traditional adversarial
training, which seeks to minimize a classifier's susceptibility to malicious
perturbations, increases the cost of generated recourse; with larger
adversarial training radii correlating with higher recourse costs. From the
perspective of algorithmic recourse, however, the appropriate adversarial
training radius has always been unknown. Another recent line of work has
motivated adversarial training with adaptive training radii to address the
issue of instance-wise variable adversarial vulnerability, showing success in
domains with unknown attack radii. This work studies the effects of adaptive
adversarial training on algorithmic recourse costs. We establish that the
improvements in model robustness induced by adaptive adversarial training show
little effect on algorithmic recourse costs, providing a potential avenue for
affordable robustness in domains where recoursability is critical
Discretization-based ensemble model for robust learning in IoT
IoT device identification is the process of recognizing and verifying
connected IoT devices to the network. This is an essential process for ensuring
that only authorized devices can access the network, and it is necessary for
network management and maintenance. In recent years, machine learning models
have been used widely for automating the process of identifying devices in the
network. However, these models are vulnerable to adversarial attacks that can
compromise their accuracy and effectiveness. To better secure device
identification models, discretization techniques enable reduction in the
sensitivity of machine learning models to adversarial attacks contributing to
the stability and reliability of the model. On the other hand, Ensemble methods
combine multiple heterogeneous models to reduce the impact of remaining noise
or errors in the model. Therefore, in this paper, we integrate discretization
techniques and ensemble methods and examine it on model robustness against
adversarial attacks. In other words, we propose a discretization-based ensemble
stacking technique to improve the security of our ML models. We evaluate the
performance of different ML-based IoT device identification models against
white box and black box attacks using a real-world dataset comprised of network
traffic from 28 IoT devices. We demonstrate that the proposed method enables
robustness to the models for IoT device identification.Comment: 15 page
On the Robustness of Explanations of Deep Neural Network Models: A Survey
Explainability has been widely stated as a cornerstone of the responsible and
trustworthy use of machine learning models. With the ubiquitous use of Deep
Neural Network (DNN) models expanding to risk-sensitive and safety-critical
domains, many methods have been proposed to explain the decisions of these
models. Recent years have also seen concerted efforts that have shown how such
explanations can be distorted (attacked) by minor input perturbations. While
there have been many surveys that review explainability methods themselves,
there has been no effort hitherto to assimilate the different methods and
metrics proposed to study the robustness of explanations of DNN models. In this
work, we present a comprehensive survey of methods that study, understand,
attack, and defend explanations of DNN models. We also present a detailed
review of different metrics used to evaluate explanation methods, as well as
describe attributional attack and defense methods. We conclude with lessons and
take-aways for the community towards ensuring robust explanations of DNN model
predictions.Comment: Under Review ACM Computing Surveys "Special Issue on Trustworthy AI
The Intriguing Relation Between Counterfactual Explanations and Adversarial Examples
The same method that creates adversarial examples (AEs) to fool
image-classifiers can be used to generate counterfactual explanations (CEs)
that explain algorithmic decisions. This observation has led researchers to
consider CEs as AEs by another name. We argue that the relationship to the true
label and the tolerance with respect to proximity are two properties that
formally distinguish CEs and AEs. Based on these arguments, we introduce CEs,
AEs, and related concepts mathematically in a common framework. Furthermore, we
show connections between current methods for generating CEs and AEs, and
estimate that the fields will merge more and more as the number of common
use-cases grows
- …