27 research outputs found
The Biased Artist: Exploiting Cultural Biases via Homoglyphs in Text-Guided Image Generation Models
Text-guided image generation models, such as DALL-E 2 and Stable Diffusion,
have recently received much attention from academia and the general public.
Provided with textual descriptions, these models are capable of generating
high-quality images depicting various concepts and styles. However, such models
are trained on large amounts of public data and implicitly learn relationships
from their training data that are not immediately apparent. We demonstrate that
common multimodal models implicitly learned cultural biases that can be
triggered and injected into the generated images by simply replacing single
characters in the textual description with visually similar non-Latin
characters. These so-called homoglyph replacements enable malicious users or
service providers to induce biases into the generated images and even render
the whole generation process useless. We practically illustrate such attacks on
DALL-E 2 and Stable Diffusion as text-guided image generation models and
further show that CLIP also behaves similarly. Our results further indicate
that text encoders trained on multilingual data provide a way to mitigate the
effects of homoglyph replacements.Comment: 31 pages, 19 figures, 4 table
Rickrolling the Artist: Injecting Invisible Backdoors into Text-Guided Image Generation Models
While text-to-image synthesis currently enjoys great popularity among
researchers and the general public, the security of these models has been
neglected so far. Many text-guided image generation models rely on pre-trained
text encoders from external sources, and their users trust that the retrieved
models will behave as promised. Unfortunately, this might not be the case. We
introduce backdoor attacks against text-guided generative models and
demonstrate that their text encoders pose a major tampering risk. Our attacks
only slightly alter an encoder so that no suspicious model behavior is apparent
for image generations with clean prompts. By then inserting a single non-Latin
character into the prompt, the adversary can trigger the model to either
generate images with pre-defined attributes or images following a hidden,
potentially malicious description. We empirically demonstrate the high
effectiveness of our attacks on Stable Diffusion and highlight that the
injection process of a single backdoor takes less than two minutes. Besides
phrasing our approach solely as an attack, it can also force an encoder to
forget phrases related to certain concepts, such as nudity or violence, and
help to make image generation safer.Comment: 25 pages, 16 figures, 5 table
To Trust or Not To Trust Prediction Scores for Membership Inference Attacks
Membership inference attacks (MIAs) aim to determine whether a specific
sample was used to train a predictive model. Knowing this may indeed lead to a
privacy breach. Most MIAs, however, make use of the model's prediction scores -
the probability of each output given some input - following the intuition that
the trained model tends to behave differently on its training data. We argue
that this is a fallacy for many modern deep network architectures.
Consequently, MIAs will miserably fail since overconfidence leads to high
false-positive rates not only on known domains but also on out-of-distribution
data and implicitly acts as a defense against MIAs. Specifically, using
generative adversarial networks, we are able to produce a potentially infinite
number of samples falsely classified as part of the training data. In other
words, the threat of MIAs is overestimated, and less information is leaked than
previously assumed. Moreover, there is actually a trade-off between the
overconfidence of models and their susceptibility to MIAs: the more classifiers
know when they do not know, making low confidence predictions, the more they
reveal the training data.Comment: 15 pages, 8 figures, 10 table
Be Careful What You Smooth For: Label Smoothing Can Be a Privacy Shield but Also a Catalyst for Model Inversion Attacks
Label smoothing -- using softened labels instead of hard ones -- is a widely
adopted regularization method for deep learning, showing diverse benefits such
as enhanced generalization and calibration. Its implications for preserving
model privacy, however, have remained unexplored. To fill this gap, we
investigate the impact of label smoothing on model inversion attacks (MIAs),
which aim to generate class-representative samples by exploiting the knowledge
encoded in a classifier, thereby inferring sensitive information about its
training data. Through extensive analyses, we uncover that traditional label
smoothing fosters MIAs, thereby increasing a model's privacy leakage. Even
more, we reveal that smoothing with negative factors counters this trend,
impeding the extraction of class-related information and leading to privacy
preservation, beating state-of-the-art defenses. This establishes a practical
and powerful novel way for enhancing model resilience against MIAs.Comment: 23 pages, 8 tables, 8 figure
Leveraging Diffusion-Based Image Variations for Robust Training on Poisoned Data
Backdoor attacks pose a serious security threat for training neural networks
as they surreptitiously introduce hidden functionalities into a model. Such
backdoors remain silent during inference on clean inputs, evading detection due
to inconspicuous behavior. However, once a specific trigger pattern appears in
the input data, the backdoor activates, causing the model to execute its
concealed function. Detecting such poisoned samples within vast datasets is
virtually impossible through manual inspection. To address this challenge, we
propose a novel approach that enables model training on potentially poisoned
datasets by utilizing the power of recent diffusion models. Specifically, we
create synthetic variations of all training samples, leveraging the inherent
resilience of diffusion models to potential trigger patterns in the data. By
combining this generative approach with knowledge distillation, we produce
student models that maintain their general performance on the task while
exhibiting robust resistance to backdoor triggers.Comment: 11 pages, 3 tables, 2 figure
V-LoL: A Diagnostic Dataset for Visual Logical Learning
Despite the successes of recent developments in visual AI, different
shortcomings still exist; from missing exact logical reasoning, to abstract
generalization abilities, to understanding complex and noisy scenes.
Unfortunately, existing benchmarks, were not designed to capture more than a
few of these aspects. Whereas deep learning datasets focus on visually complex
data but simple visual reasoning tasks, inductive logic datasets involve
complex logical learning tasks, however, lack the visual component. To address
this, we propose the visual logical learning dataset, V-LoL, that seamlessly
combines visual and logical challenges. Notably, we introduce the first
instantiation of V-LoL, V-LoL-Trains, -- a visual rendition of a classic
benchmark in symbolic AI, the Michalski train problem. By incorporating
intricate visual scenes and flexible logical reasoning tasks within a versatile
framework, V-LoL-Trains provides a platform for investigating a wide range of
visual logical learning challenges. We evaluate a variety of AI systems
including traditional symbolic AI, neural AI, as well as neuro-symbolic AI. Our
evaluations demonstrate that even state-of-the-art AI faces difficulties in
dealing with visual logical learning challenges, highlighting unique advantages
and limitations specific to each methodology. Overall, V-LoL opens up new
avenues for understanding and enhancing current abilities in visual logical
learning for AI systems
Combining AI and AM - Improving Approximate Matching through Transformer Networks
Approximate matching (AM) is a concept in digital forensics to determine the
similarity between digital artifacts. An important use case of AM is the
reliable and efficient detection of case-relevant data structures on a
blacklist, if only fragments of the original are available. For instance, if
only a cluster of indexed malware is still present during the digital forensic
investigation, the AM algorithm shall be able to assign the fragment to the
blacklisted malware. However, traditional AM functions like TLSH and ssdeep
fail to detect files based on their fragments if the presented piece is
relatively small compared to the overall file size. A second well-known issue
with traditional AM algorithms is the lack of scaling due to the
ever-increasing lookup databases. We propose an improved matching algorithm
based on transformer models from the field of natural language processing. We
call our approach Deep Learning Approximate Matching (DLAM). As a concept from
artificial intelligence (AI), DLAM gets knowledge of characteristic blacklisted
patterns during its training phase. Then DLAM is able to detect the patterns in
a typically much larger file, that is DLAM focuses on the use case of fragment
detection. We reveal that DLAM has three key advantages compared to the
prominent conventional approaches TLSH and ssdeep. First, it makes the tedious
extraction of known to be bad parts obsolete, which is necessary until now
before any search for them with AM algorithms. This allows efficient
classification of files on a much larger scale, which is important due to
exponentially increasing data to be investigated. Second, depending on the use
case, DLAM achieves a similar or even significantly higher accuracy in
recovering fragments of blacklisted files. Third, we show that DLAM enables the
detection of file correlations in the output of TLSH and ssdeep even for small
fragment sizes.Comment: Published at DFRWS USA 2023 as a conference pape
Does CLIP Know My Face?
With the rise of deep learning in various applications, privacy concerns
around the protection of training data has become a critical area of research.
Whereas prior studies have focused on privacy risks in single-modal models, we
introduce a novel method to assess privacy for multi-modal models, specifically
vision-language models like CLIP. The proposed Identity Inference Attack (IDIA)
reveals whether an individual was included in the training data by querying the
model with images of the same person. Letting the model choose from a wide
variety of possible text labels, the model reveals whether it recognizes the
person and, therefore, was used for training. Our large-scale experiments on
CLIP demonstrate that individuals used for training can be identified with very
high accuracy. We confirm that the model has learned to associate names with
depicted individuals, implying the existence of sensitive information that can
be extracted by adversaries. Our results highlight the need for stronger
privacy protection in large-scale models and suggest that IDIAs can be used to
prove the unauthorized use of data for training and to enforce privacy laws.Comment: 15 pages, 6 figure
Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness
Generative AI models have recently achieved astonishing results in quality
and are consequently employed in a fast-growing number of applications.
However, since they are highly data-driven, relying on billion-sized datasets
randomly scraped from the internet, they also suffer from degenerated and
biased human behavior, as we demonstrate. In fact, they may even reinforce such
biases. To not only uncover but also combat these undesired effects, we present
a novel strategy, called Fair Diffusion, to attenuate biases after the
deployment of generative text-to-image models. Specifically, we demonstrate
shifting a bias, based on human instructions, in any direction yielding
arbitrarily new proportions for, e.g., identity groups. As our empirical
evaluation demonstrates, this introduced control enables instructing generative
image models on fairness, with no data filtering and additional training
required
Industrial Data Science: Developing a Qualification Concept for Machine Learning in Industrial Production
The advent of Industry 4.0 and the availability of large data storage systems lead to an increasing demand for specially educated data-oriented professionals in industrial production. The education of these specialists should combine elements from three fields: Industrial engineering, data analysis and data administration. However, a comprehensive education program incorporating all three elements has not yet been established in Germany.
The aim of the acquired research project, titled “Industrial Data Science” is to develop and implement a qualification concept for Machine Learning based on demands coming up in industrial environments. The concept is targeted at two groups: Advanced students from any of the three mentioned fields (Mechanical Engineering, Statistics, Computer Science) and industrial professionals.
In the qualification concept the needs of industrial companies are considered. Therefore, a survey was created to inquire the use and potentials of Machine Learning and the requirements for future Data Scientists in industrial production. The evaluation of the survey and the resulting conclusions affecting the qualification concept are presented in this paper