50 research outputs found
Non-blind watermarking of network flows
Linking network flows is an important problem in intrusion detection as well
as anonymity. Passive traffic analysis can link flows but requires long periods
of observation to reduce errors. Active traffic analysis, also known as flow
watermarking, allows for better precision and is more scalable. Previous flow
watermarks introduce significant delays to the traffic flow as a side effect of
using a blind detection scheme; this enables attacks that detect and remove the
watermark, while at the same time slowing down legitimate traffic. We propose
the first non-blind approach for flow watermarking, called RAINBOW, that
improves watermark invisibility by inserting delays hundreds of times smaller
than previous blind watermarks, hence reduces the watermark interference on
network flows. We derive and analyze the optimum detectors for RAINBOW as well
as the passive traffic analysis under different traffic models by using
hypothesis testing. Comparing the detection performance of RAINBOW and the
passive approach we observe that both RAINBOW and passive traffic analysis
perform similarly good in the case of uncorrelated traffic, however, the
RAINBOW detector drastically outperforms the optimum passive detector in the
case of correlated network flows. This justifies the use of non-blind
watermarks over passive traffic analysis even though both approaches have
similar scalability constraints. We confirm our analysis by simulating the
detectors and testing them against large traces of real network flows
Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning
Deep neural networks are susceptible to various inference attacks as they
remember information about their training data. We design white-box inference
attacks to perform a comprehensive privacy analysis of deep learning models. We
measure the privacy leakage through parameters of fully trained models as well
as the parameter updates of models during training. We design inference
algorithms for both centralized and federated learning, with respect to passive
and active inference attackers, and assuming different adversary prior
knowledge.
We evaluate our novel white-box membership inference attacks against deep
learning algorithms to trace their training data records. We show that a
straightforward extension of the known black-box attacks to the white-box
setting (through analyzing the outputs of activation functions) is ineffective.
We therefore design new algorithms tailored to the white-box setting by
exploiting the privacy vulnerabilities of the stochastic gradient descent
algorithm, which is the algorithm used to train deep neural networks. We
investigate the reasons why deep learning models may leak information about
their training data. We then show that even well-generalized models are
significantly susceptible to white-box membership inference attacks, by
analyzing state-of-the-art pre-trained and publicly available models for the
CIFAR dataset. We also show how adversarial participants, in the federated
learning setting, can successfully run active membership inference attacks
against other participants, even when the global model achieves high prediction
accuracies.Comment: 2019 IEEE Symposium on Security and Privacy (SP
Membership Privacy for Machine Learning Models Through Knowledge Transfer
Large capacity machine learning (ML) models are prone to membership inference
attacks (MIAs), which aim to infer whether the target sample is a member of the
target model's training dataset. The serious privacy concerns due to the
membership inference have motivated multiple defenses against MIAs, e.g.,
differential privacy and adversarial regularization. Unfortunately, these
defenses produce ML models with unacceptably low classification performances.
Our work proposes a new defense, called distillation for membership privacy
(DMP), against MIAs that preserves the utility of the resulting models
significantly better than prior defenses. DMP leverages knowledge distillation
to train ML models with membership privacy. We provide a novel criterion to
tune the data used for knowledge transfer in order to amplify the membership
privacy of DMP. Our extensive evaluation shows that DMP provides significantly
better tradeoffs between membership privacy and classification accuracies
compared to state-of-the-art MIA defenses. For instance, DMP achieves ~100%
accuracy improvement over adversarial regularization for DenseNet trained on
CIFAR100, for similar membership privacy (measured using MIA risk): when the
MIA risk is 53.7%, adversarially regularized DenseNet is 33.6% accurate, while
DMP-trained DenseNet is 65.3% accurate.Comment: To Appear in the 35th AAAI Conference on Artificial Intelligence,
202
Towards Provably Invisible Network Flow Fingerprints
Network traffic analysis reveals important information even when messages are
encrypted. We consider active traffic analysis via flow fingerprinting by
invisibly embedding information into packet timings of flows. In particular,
assume Alice wishes to embed fingerprints into flows of a set of network input
links, whose packet timings are modeled by Poisson processes, without being
detected by a watchful adversary Willie. Bob, who receives the set of
fingerprinted flows after they pass through the network modeled as a collection
of independent and parallel queues, wishes to extract Alice's embedded
fingerprints to infer the connection between input and output links of the
network. We consider two scenarios: 1) Alice embeds fingerprints in all of the
flows; 2) Alice embeds fingerprints in each flow independently with probability
. Assuming that the flow rates are equal, we calculate the maximum number of
flows in which Alice can invisibly embed fingerprints while having those
fingerprints successfully decoded by Bob. Then, we extend the construction and
analysis to the case where flow rates are distinct, and discuss the extension
of the network model
Understanding (Un)Intended Memorization in Text-to-Image Generative Models
Multimodal machine learning, especially text-to-image models like Stable
Diffusion and DALL-E 3, has gained significance for transforming text into
detailed images.
Despite their growing use and remarkable generative capabilities, there is a
pressing need for a detailed examination of these models' behavior,
particularly with respect to memorization. Historically, memorization in
machine learning has been context-dependent, with diverse definitions emerging
from classification tasks to complex models like Large Language Models (LLMs)
and Diffusion models. Yet, a definitive concept of memorization that aligns
with the intricacies of text-to-image synthesis remains elusive. This
understanding is vital as memorization poses privacy risks yet is essential for
meeting user expectations, especially when generating representations of
underrepresented entities. In this paper, we introduce a specialized definition
of memorization tailored to text-to-image models, categorizing it into three
distinct types according to user expectations. We closely examine the subtle
distinctions between intended and unintended memorization, emphasizing the
importance of balancing user privacy with the generative quality of the model
outputs. Using the Stable Diffusion model, we offer examples to validate our
memorization definitions and clarify their application