50 research outputs found
The Libra Toolkit for Probabilistic Models
The Libra Toolkit is a collection of algorithms for learning and inference
with discrete probabilistic models, including Bayesian networks, Markov
networks, dependency networks, and sum-product networks. Compared to other
toolkits, Libra places a greater emphasis on learning the structure of
tractable models in which exact inference is efficient. It also includes a
variety of algorithms for learning graphical models in which inference is
potentially intractable, and for performing exact and approximate inference.
Libra is released under a 2-clause BSD license to encourage broad use in
academia and industry
HotFlip: White-Box Adversarial Examples for Text Classification
We propose an efficient method to generate white-box adversarial examples to
trick a character-level neural classifier. We find that only a few
manipulations are needed to greatly decrease the accuracy. Our method relies on
an atomic flip operation, which swaps one token for another, based on the
gradients of the one-hot input vectors. Due to efficiency of our method, we can
perform adversarial training which makes the model more robust to attacks at
test time. With the use of a few semantics-preserving constraints, we
demonstrate that HotFlip can be adapted to attack a word-level classifier as
well
Training Data Influence Analysis and Estimation: A Survey
Good models require good training data. For overparameterized deep models,
the causal relationship between training data and model predictions is
increasingly opaque and poorly understood. Influence analysis partially
demystifies training's underlying interactions by quantifying the amount each
training instance alters the final model. Measuring the training data's
influence exactly can be provably hard in the worst case; this has led to the
development and use of influence estimators, which only approximate the true
influence. This paper provides the first comprehensive survey of training data
influence analysis and estimation. We begin by formalizing the various, and in
places orthogonal, definitions of training data influence. We then organize
state-of-the-art influence analysis methods into a taxonomy; we describe each
of these methods in detail and compare their underlying assumptions, asymptotic
complexities, and overall strengths and weaknesses. Finally, we propose future
research directions to make influence analysis more useful in practice as well
as more theoretically and empirically sound. A curated, up-to-date list of
resources related to influence analysis is available at
https://github.com/ZaydH/influence_analysis_papers
Provable Robustness Against a Union of Adversarial Attacks
Sparse or adversarial attacks arbitrarily perturb an unknown subset
of the features. robustness analysis is particularly well-suited for
heterogeneous (tabular) data where features have different types or scales.
State-of-the-art certified defenses are based on randomized smoothing
and apply to evasion attacks only. This paper proposes feature partition
aggregation (FPA) -- a certified defense against the union of evasion,
backdoor, and poisoning attacks. FPA generates its stronger robustness
guarantees via an ensemble whose submodels are trained on disjoint feature
sets. Compared to state-of-the-art defenses, FPA is up to
3,000 faster and provides larger median robustness guarantees (e.g.,
median certificates of 13 pixels over 10 for CIFAR10, 12 pixels over 10 for
MNIST, 4 features over 1 for Weather, and 3 features over 1 for Ames), meaning
FPA provides the additional dimensions of robustness essentially for free.Comment: Accepted at AAAI 2024 -- Extended version including the supplementary
materia
Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers
Backdoor attacks manipulate model predictions by inserting innocuous triggers
into training and test data. We focus on more realistic and more challenging
clean-label attacks where the adversarial training examples are correctly
labeled. Our attack, LLMBkd, leverages language models to automatically insert
diverse style-based triggers into texts. We also propose a poison selection
technique to improve the effectiveness of both LLMBkd as well as existing
textual backdoor attacks. Lastly, we describe REACT, a baseline defense to
mitigate backdoor attacks via antidote training examples. Our evaluations
demonstrate LLMBkd's effectiveness and efficiency, where we consistently
achieve high attack success rates across a wide range of styles with little
effort and no model training.Comment: Accepted at EMNLP 2023 Finding