Search CORE

50 research outputs found

The Libra Toolkit for Probabilistic Models

Author: Lowd Daniel
Rooshenas Amirmohammad
Publication venue
Publication date: 01/01/2015
Field of study

The Libra Toolkit is a collection of algorithms for learning and inference with discrete probabilistic models, including Bayesian networks, Markov networks, dependency networks, and sum-product networks. Compared to other toolkits, Libra places a greater emphasis on learning the structure of tractable models in which exact inference is efficient. It also includes a variety of algorithms for learning graphical models in which inference is potentially intractable, and for performing exact and approximate inference. Libra is released under a 2-clause BSD license to encourage broad use in academia and industry

arXiv.org e-Print Archive

CiteSeerX

HotFlip: White-Box Adversarial Examples for Text Classification

Author: Dou Dejing
Ebrahimi Javid
Lowd Daniel
Rao Anyi
Publication venue
Publication date: 01/01/2018
Field of study

We propose an efficient method to generate white-box adversarial examples to trick a character-level neural classifier. We find that only a few manipulations are needed to greatly decrease the accuracy. Our method relies on an atomic flip operation, which swaps one token for another, based on the gradients of the one-hot input vectors. Due to efficiency of our method, we can perform adversarial training which makes the model more robust to attacks at test time. With the use of a few semantics-preserving constraints, we demonstrate that HotFlip can be adapted to attack a word-level classifier as well

arXiv.org e-Print Archive

Crossref

Training Data Influence Analysis and Estimation: A Survey

Author: Hammoudeh Zayd
Lowd Daniel
Publication venue
Publication date: 08/12/2022
Field of study

Good models require good training data. For overparameterized deep models, the causal relationship between training data and model predictions is increasingly opaque and poorly understood. Influence analysis partially demystifies training's underlying interactions by quantifying the amount each training instance alters the final model. Measuring the training data's influence exactly can be provably hard in the worst case; this has led to the development and use of influence estimators, which only approximate the true influence. This paper provides the first comprehensive survey of training data influence analysis and estimation. We begin by formalizing the various, and in places orthogonal, definitions of training data influence. We then organize state-of-the-art influence analysis methods into a taxonomy; we describe each of these methods in detail and compare their underlying assumptions, asymptotic complexities, and overall strengths and weaknesses. Finally, we propose future research directions to make influence analysis more useful in practice as well as more theoretically and empirically sound. A curated, up-to-date list of resources related to influence analysis is available at https://github.com/ZaydH/influence_analysis_papers

arXiv.org e-Print Archive

Provable Robustness Against a Union of $\ell_0$ Adversarial Attacks

Author: Hammoudeh Zayd
Lowd Daniel
Publication venue
Publication date: 06/04/2024
Field of study

Sparse or

\ell_0

adversarial attacks arbitrarily perturb an unknown subset of the features.

\ell_0

robustness analysis is particularly well-suited for heterogeneous (tabular) data where features have different types or scales. State-of-the-art

\ell_0

certified defenses are based on randomized smoothing and apply to evasion attacks only. This paper proposes feature partition aggregation (FPA) -- a certified defense against the union of

\ell_0

evasion, backdoor, and poisoning attacks. FPA generates its stronger robustness guarantees via an ensemble whose submodels are trained on disjoint feature sets. Compared to state-of-the-art

\ell_0

defenses, FPA is up to 3,000

{\times}

faster and provides larger median robustness guarantees (e.g., median certificates of 13 pixels over 10 for CIFAR10, 12 pixels over 10 for MNIST, 4 features over 1 for Weather, and 3 features over 1 for Ames), meaning FPA provides the additional dimensions of robustness essentially for free.Comment: Accepted at AAAI 2024 -- Extended version including the supplementary materia

arXiv.org e-Print Archive

Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers

Author: Hammoudeh Zayd
Lowd Daniel
You Wencong
Publication venue
Publication date: 28/10/2023
Field of study

Backdoor attacks manipulate model predictions by inserting innocuous triggers into training and test data. We focus on more realistic and more challenging clean-label attacks where the adversarial training examples are correctly labeled. Our attack, LLMBkd, leverages language models to automatically insert diverse style-based triggers into texts. We also propose a poison selection technique to improve the effectiveness of both LLMBkd as well as existing textual backdoor attacks. Lastly, we describe REACT, a baseline defense to mitigate backdoor attacks via antidote training examples. Our evaluations demonstrate LLMBkd's effectiveness and efficiency, where we consistently achieve high attack success rates across a wide range of styles with little effort and no model training.Comment: Accepted at EMNLP 2023 Finding

arXiv.org e-Print Archive