297 research outputs found
Guiding Text-to-Text Privatization by Syntax
Metric Differential Privacy is a generalization of differential privacy
tailored to address the unique challenges of text-to-text privatization. By
adding noise to the representation of words in the geometric space of
embeddings, words are replaced with words located in the proximity of the noisy
representation. Since embeddings are trained based on word co-occurrences, this
mechanism ensures that substitutions stem from a common semantic context.
Without considering the grammatical category of words, however, this mechanism
cannot guarantee that substitutions play similar syntactic roles. We analyze
the capability of text-to-text privatization to preserve the grammatical
category of words after substitution and find that surrogate texts consist
almost exclusively of nouns. Lacking the capability to produce surrogate texts
that correlate with the structure of the sensitive texts, we encompass our
analysis by transforming the privatization step into a candidate selection
problem in which substitutions are directed to words with matching grammatical
properties. We demonstrate a substantial improvement in the performance of
downstream tasks by up to while retaining comparative privacy
guarantees
Driving Context into Text-to-Text Privatization
\textit{Metric Differential Privacy} enables text-to-text privatization by
adding calibrated noise to the vector of a word derived from an embedding space
and projecting this noisy vector back to a discrete vocabulary using a nearest
neighbor search. Since words are substituted without context, this mechanism is
expected to fall short at finding substitutes for words with ambiguous
meanings, such as \textit{'bank'}. To account for these ambiguous words, we
leverage a sense embedding and incorporate a sense disambiguation step prior to
noise injection. We encompass our modification to the privatization mechanism
with an estimation of privacy and utility. For word sense disambiguation on the
\textit{Words in Context} dataset, we demonstrate a substantial increase in
classification accuracy by
Generating Artificial Data for Private Deep Learning
In this paper, we propose generating artificial data that retain statistical
properties of real data as the means of providing privacy with respect to the
original dataset. We use generative adversarial network to draw
privacy-preserving artificial data samples and derive an empirical method to
assess the risk of information disclosure in a differential-privacy-like way.
Our experiments show that we are able to generate artificial data of high
quality and successfully train and validate machine learning models on this
data while limiting potential privacy loss.Comment: Privacy-Enhancing Artificial Intelligence and Language Technologies,
AAAI Spring Symposium Series, 201
- …