9 research outputs found
Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
Cutting-edge diffusion models produce images with high quality and
customizability, enabling them to be used for commercial art and graphic design
purposes. But do diffusion models create unique works of art, or are they
replicating content directly from their training sets? In this work, we study
image retrieval frameworks that enable us to compare generated images with
training samples and detect when content has been replicated. Applying our
frameworks to diffusion models trained on multiple datasets including Oxford
flowers, Celeb-A, ImageNet, and LAION, we discuss how factors such as training
set size impact rates of content replication. We also identify cases where
diffusion models, including the popular Stable Diffusion model, blatantly copy
from their training data.Comment: Updated draft with the following changes (1) Clarified the LAION
Aesthetics versions everywhere (2) Correction on which LAION Aesthetics
version SD - 1.4 is finetuned on and updated figure 12 based on this (3) A
section on possible causes of replicatio
Understanding and Mitigating Copying in Diffusion Models
Images generated by diffusion models like Stable Diffusion are increasingly
widespread. Recent works and even lawsuits have shown that these models are
prone to replicating their training data, unbeknownst to the user. In this
paper, we first analyze this memorization problem in text-to-image diffusion
models. While it is widely believed that duplicated images in the training set
are responsible for content replication at inference time, we observe that the
text conditioning of the model plays a similarly important role. In fact, we
see in our experiments that data replication often does not happen for
unconditional models, while it is common in the text-conditional case.
Motivated by our findings, we then propose several techniques for reducing data
replication at both training and inference time by randomizing and augmenting
image captions in the training set.Comment: 17 pages, preprint. Code is available at
https://github.com/somepago/DC
How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization
Despite the clear performance benefits of data augmentations, little is known
about why they are so effective. In this paper, we disentangle several key
mechanisms through which data augmentations operate. Establishing an exchange
rate between augmented and additional real data, we find that in
out-of-distribution testing scenarios, augmentations which yield samples that
are diverse, but inconsistent with the data distribution can be even more
valuable than additional training data. Moreover, we find that data
augmentations which encourage invariances can be more valuable than invariance
alone, especially on small and medium sized training sets. Following this
observation, we show that augmentations induce additional stochasticity during
training, effectively flattening the loss landscape.Comment: 31 pages, 29 figures. To be presented at ICLR 2023. Code at
https://github.com/JonasGeiping/dataaug
A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning
Academic tabular benchmarks often contain small sets of curated features. In
contrast, data scientists typically collect as many features as possible into
their datasets, and even engineer new features from existing ones. To prevent
overfitting in subsequent downstream modeling, practitioners commonly use
automated feature selection methods that identify a reduced subset of
informative features. Existing benchmarks for tabular feature selection
consider classical downstream models, toy synthetic datasets, or do not
evaluate feature selectors on the basis of downstream performance. Motivated by
the increasing popularity of tabular deep learning, we construct a challenging
feature selection benchmark evaluated on downstream neural networks including
transformers, using real datasets and multiple methods for generating
extraneous features. We also propose an input-gradient-based analogue of Lasso
for neural networks that outperforms classical feature selection methods on
challenging problems such as selecting from corrupted or second-order features
Baseline Defenses for Adversarial Attacks Against Aligned Language Models
As Large Language Models quickly become ubiquitous, it becomes critical to
understand their security vulnerabilities. Recent work shows that text
optimizers can produce jailbreaking prompts that bypass moderation and
alignment. Drawing from the rich body of work on adversarial machine learning,
we approach these attacks with three questions: What threat models are
practically useful in this domain? How do baseline defense techniques perform
in this new domain? How does LLM security differ from computer vision?
We evaluate several baseline defense strategies against leading adversarial
attacks on LLMs, discussing the various settings in which each is feasible and
effective. Particularly, we look at three types of defenses: detection
(perplexity based), input preprocessing (paraphrase and retokenization), and
adversarial training. We discuss white-box and gray-box settings and discuss
the robustness-performance trade-off for each of the defenses considered. We
find that the weakness of existing discrete optimizers for text, combined with
the relatively high costs of optimization, makes standard adaptive attacks more
challenging for LLMs. Future research will be needed to uncover whether more
powerful optimizers can be developed, or whether the strength of filtering and
preprocessing defenses is greater in the LLMs domain than it has been in
computer vision.Comment: 12 page
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Neural network based computer vision systems are typically built on a
backbone, a pretrained or randomly initialized feature extractor. Several years
ago, the default option was an ImageNet-trained convolutional neural network.
However, the recent past has seen the emergence of countless backbones
pretrained using various algorithms and datasets. While this abundance of
choice has led to performance increases for a range of systems, it is difficult
for practitioners to make informed decisions about which backbone to choose.
Battle of the Backbones (BoB) makes this choice easier by benchmarking a
diverse suite of pretrained models, including vision-language models, those
trained via self-supervised learning, and the Stable Diffusion backbone, across
a diverse set of computer vision tasks ranging from classification to object
detection to OOD generalization and more. Furthermore, BoB sheds light on
promising directions for the research community to advance computer vision by
illuminating strengths and weakness of existing approaches through a
comprehensive analysis conducted on more than 1500 training runs. While vision
transformers (ViTs) and self-supervised learning (SSL) are increasingly
popular, we find that convolutional neural networks pretrained in a supervised
fashion on large training sets still perform best on most tasks among the
models we consider. Moreover, in apples-to-apples comparisons on the same
architectures and similarly sized pretraining datasets, we find that SSL
backbones are highly competitive, indicating that future works should perform
SSL pretraining with advanced architectures and larger pretraining datasets. We
release the raw results of our experiments along with code that allows
researchers to put their own backbones through the gauntlet here:
https://github.com/hsouri/Battle-of-the-BackbonesComment: Accepted to NeurIPS 202
NEFTune: Noisy Embeddings Improve Instruction Finetuning
We show that language model finetuning can be improved, sometimes
dramatically, with a simple augmentation. NEFTune adds noise to the embedding
vectors during training. Standard finetuning of LLaMA-2-7B using Alpaca
achieves 29.79% on AlpacaEval, which rises to 64.69% using noisy embeddings.
NEFTune also improves over strong baselines on modern instruction datasets.
Models trained with Evol-Instruct see a 10% improvement, with ShareGPT an 8%
improvement, and with OpenPlatypus an 8% improvement. Even powerful models
further refined with RLHF such as LLaMA-2-Chat benefit from additional training
with NEFTune.Comment: 25 pages, Code is available on Github:
https://github.com/neelsjain/NEFTun