1,705 research outputs found
Privacy Preservation by Disassociation
In this work, we focus on protection against identity disclosure in the
publication of sparse multidimensional data. Existing multidimensional
anonymization techniquesa) protect the privacy of users either by altering the
set of quasi-identifiers of the original data (e.g., by generalization or
suppression) or by adding noise (e.g., using differential privacy) and/or (b)
assume a clear distinction between sensitive and non-sensitive information and
sever the possible linkage. In many real world applications the above
techniques are not applicable. For instance, consider web search query logs.
Suppressing or generalizing anonymization methods would remove the most
valuable information in the dataset: the original query terms. Additionally,
web search query logs contain millions of query terms which cannot be
categorized as sensitive or non-sensitive since a term may be sensitive for a
user and non-sensitive for another. Motivated by this observation, we propose
an anonymization technique termed disassociation that preserves the original
terms but hides the fact that two or more different terms appear in the same
record. We protect the users' privacy by disassociating record terms that
participate in identifying combinations. This way the adversary cannot
associate with high probability a record with a rare combination of terms. To
the best of our knowledge, our proposal is the first to employ such a technique
to provide protection against identity disclosure. We propose an anonymization
algorithm based on our approach and evaluate its performance on real and
synthetic datasets, comparing it against other state-of-the-art methods based
on generalization and differential privacy.Comment: VLDB201
CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks
The unprecedented increase in the usage of computer vision technology in
society goes hand in hand with an increased concern in data privacy. In many
real-world scenarios like people tracking or action recognition, it is
important to be able to process the data while taking careful consideration in
protecting people's identity. We propose and develop CIAGAN, a model for image
and video anonymization based on conditional generative adversarial networks.
Our model is able to remove the identifying characteristics of faces and bodies
while producing high-quality images and videos that can be used for any
computer vision task, such as detection or tracking. Unlike previous methods,
we have full control over the de-identification (anonymization) procedure,
ensuring both anonymization as well as diversity. We compare our method to
several baselines and achieve state-of-the-art results.Comment: CVPR 202
Keystroke Biometrics in Response to Fake News Propagation in a Global Pandemic
This work proposes and analyzes the use of keystroke biometrics for content
de-anonymization. Fake news have become a powerful tool to manipulate public
opinion, especially during major events. In particular, the massive spread of
fake news during the COVID-19 pandemic has forced governments and companies to
fight against missinformation. In this context, the ability to link multiple
accounts or profiles that spread such malicious content on the Internet while
hiding in anonymity would enable proactive identification and blacklisting.
Behavioral biometrics can be powerful tools in this fight. In this work, we
have analyzed how the latest advances in keystroke biometric recognition can
help to link behavioral typing patterns in experiments involving 100,000 users
and more than 1 million typed sequences. Our proposed system is based on
Recurrent Neural Networks adapted to the context of content de-anonymization.
Assuming the challenge to link the typed content of a target user in a pool of
candidate profiles, our results show that keystroke recognition can be used to
reduce the list of candidate profiles by more than 90%. In addition, when
keystroke is combined with auxiliary data (such as location), our system
achieves a Rank-1 identification performance equal to 52.6% and 10.9% for a
background candidate list composed of 1K and 100K profiles, respectively.Comment: arXiv admin note: text overlap with arXiv:2004.0362
Quantification of De-anonymization Risks in Social Networks
The risks of publishing privacy-sensitive data have received considerable
attention recently. Several de-anonymization attacks have been proposed to
re-identify individuals even if data anonymization techniques were applied.
However, there is no theoretical quantification for relating the data utility
that is preserved by the anonymization techniques and the data vulnerability
against de-anonymization attacks.
In this paper, we theoretically analyze the de-anonymization attacks and
provide conditions on the utility of the anonymized data (denoted by anonymized
utility) to achieve successful de-anonymization. To the best of our knowledge,
this is the first work on quantifying the relationships between anonymized
utility and de-anonymization capability. Unlike previous work, our
quantification analysis requires no assumptions about the graph model, thus
providing a general theoretical guide for developing practical
de-anonymization/anonymization techniques.
Furthermore, we evaluate state-of-the-art de-anonymization attacks on a
real-world Facebook dataset to show the limitations of previous work. By
comparing these experimental results and the theoretically achievable
de-anonymization capability derived in our analysis, we further demonstrate the
ineffectiveness of previous de-anonymization attacks and the potential of more
powerful de-anonymization attacks in the future.Comment: Published in International Conference on Information Systems Security
and Privacy, 201
Differentially Private Publication of Sparse Data
The problem of privately releasing data is to provide a version of a dataset
without revealing sensitive information about the individuals who contribute to
the data. The model of differential privacy allows such private release while
providing strong guarantees on the output. A basic mechanism achieves
differential privacy by adding noise to the frequency counts in the contingency
tables (or, a subset of the count data cube) derived from the dataset. However,
when the dataset is sparse in its underlying space, as is the case for most
multi-attribute relations, then the effect of adding noise is to vastly
increase the size of the published data: it implicitly creates a huge number of
dummy data points to mask the true data, making it almost impossible to work
with.
We present techniques to overcome this roadblock and allow efficient private
release of sparse data, while maintaining the guarantees of differential
privacy. Our approach is to release a compact summary of the noisy data.
Generating the noisy data and then summarizing it would still be very costly,
so we show how to shortcut this step, and instead directly generate the summary
from the input data, without materializing the vast intermediate noisy data. We
instantiate this outline for a variety of sampling and filtering methods, and
show how to use the resulting summary for approximate, private, query
answering. Our experimental study shows that this is an effective, practical
solution, with comparable and occasionally improved utility over the costly
materialization approach
- …