19 research outputs found
Privacy-Preserving Domain Adaptation of Semantic Parsers
Task-oriented dialogue systems often assist users with personal or
confidential matters. For this reason, the developers of such a system are
generally prohibited from observing actual usage. So how can they know where
the system is failing and needs more training data or new functionality? In
this work, we study ways in which realistic user utterances can be generated
synthetically, to help increase the linguistic and functional coverage of the
system, without compromising the privacy of actual users. To this end, we
propose a two-stage Differentially Private (DP) generation method which first
generates latent semantic parses, and then generates utterances based on the
parses. Our proposed approach improves MAUVE by 2.5 and parse tree
function type overlap by 1.3 relative to current approaches for private
synthetic data generation, improving both on fluency and semantic coverage. We
further validate our approach on a realistic domain adaptation task of adding
new functionality from private user data to a semantic parser, and show overall
gains of 8.5% points in accuracy with the new feature.Comment: ACL 202
Are Chatbots Ready for Privacy-Sensitive Applications? An Investigation into Input Regurgitation and Prompt-Induced Sanitization
LLM-powered chatbots are becoming widely adopted in applications such as
healthcare, personal assistants, industry hiring decisions, etc. In many of
these cases, chatbots are fed sensitive, personal information in their prompts,
as samples for in-context learning, retrieved records from a database, or as
part of the conversation. The information provided in the prompt could directly
appear in the output, which might have privacy ramifications if there is
sensitive information there. As such, in this paper, we aim to understand the
input copying and regurgitation capabilities of these models during inference
and how they can be directly instructed to limit this copying by complying with
regulations such as HIPAA and GDPR, based on their internal knowledge of them.
More specifically, we find that when ChatGPT is prompted to summarize cover
letters of a 100 candidates, it would retain personally identifiable
information (PII) verbatim in 57.4% of cases, and we find this retention to be
non-uniform between different subgroups of people, based on attributes such as
gender identity. We then probe ChatGPT's perception of privacy-related policies
and privatization mechanisms by directly instructing it to provide compliant
outputs and observe a significant omission of PII from output.Comment: 12 pages, 9 figures, and 4 table
Benchmarking Differential Privacy and Federated Learning for BERT Models
Natural Language Processing (NLP) techniques can be applied to help with the
diagnosis of medical conditions such as depression, using a collection of a
person's utterances. Depression is a serious medical illness that can have
adverse effects on how one feels, thinks, and acts, which can lead to emotional
and physical problems. Due to the sensitive nature of such data, privacy
measures need to be taken for handling and training models with such data. In
this work, we study the effects that the application of Differential Privacy
(DP) has, in both a centralized and a Federated Learning (FL) setup, on
training contextualized language models (BERT, ALBERT, RoBERTa and DistilBERT).
We offer insights on how to privately train NLP models and what architectures
and setups provide more desirable privacy utility trade-offs. We envisage this
work to be used in future healthcare and mental health studies to keep medical
history private. Therefore, we provide an open-source implementation of this
work.Comment: 4 pages, 3 tables, 1 figur
Membership Inference Attacks against Language Models via Neighbourhood Comparison
Membership Inference attacks (MIAs) aim to predict whether a data sample was
present in the training data of a machine learning model or not, and are widely
used for assessing the privacy risks of language models. Most existing attacks
rely on the observation that models tend to assign higher probabilities to
their training samples than non-training points. However, simple thresholding
of the model score in isolation tends to lead to high false-positive rates as
it does not account for the intrinsic complexity of a sample. Recent work has
demonstrated that reference-based attacks which compare model scores to those
obtained from a reference model trained on similar data can substantially
improve the performance of MIAs. However, in order to train reference models,
attacks of this kind make the strong and arguably unrealistic assumption that
an adversary has access to samples closely resembling the original training
data. Therefore, we investigate their performance in more realistic scenarios
and find that they are highly fragile in relation to the data distribution used
to train reference models. To investigate whether this fragility provides a
layer of safety, we propose and evaluate neighbourhood attacks, which compare
model scores for a given sample to scores of synthetically generated neighbour
texts and therefore eliminate the need for access to the training data
distribution. We show that, in addition to being competitive with
reference-based attacks that have perfect knowledge about the training data
distribution, our attack clearly outperforms existing reference-free attacks as
well as reference-based attacks with imperfect knowledge, which demonstrates
the need for a reevaluation of the threat model of adversarial attacks