4,143 research outputs found
Location Anonymization With Considering Errors and Existence Probability
Mobile devices that can sense their location using GPS or Wi-Fi have become extremely popular. However, many users hesitate to provide their accurate location information to unreliable third parties if it means that their identities or sensitive attribute values will be disclosed by doing so. Many approaches for anonymization, such as k-anonymity, have been proposed to tackle this issue. Existing studies for k-anonymity usually anonymize each user\u27s location so that the anonymized area contains k or more users. Existing studies, however, do not consider location errors and the probability that each user actually exists at the anonymized area. As a result, a specific user might be identified by untrusted third parties. We propose novel privacy and utility metrics that can treat the location and an efficient algorithm to anonymize the information associated with users\u27 locations. This is the first work that anonymizes location while considering location errors and the probability that each user is actually present at the anonymized area. By means of simulations, we have proven that our proposed method can reduce the risk of the user\u27s attributes being identified while maintaining the utility of the anonymized data
Link Prediction by De-anonymization: How We Won the Kaggle Social Network Challenge
This paper describes the winning entry to the IJCNN 2011 Social Network
Challenge run by Kaggle.com. The goal of the contest was to promote research on
real-world link prediction, and the dataset was a graph obtained by crawling
the popular Flickr social photo sharing website, with user identities scrubbed.
By de-anonymizing much of the competition test set using our own Flickr crawl,
we were able to effectively game the competition. Our attack represents a new
application of de-anonymization to gaming machine learning contests, suggesting
changes in how future competitions should be run.
We introduce a new simulated annealing-based weighted graph matching
algorithm for the seeding step of de-anonymization. We also show how to combine
de-anonymization with link prediction---the latter is required to achieve good
performance on the portion of the test set not de-anonymized---for example by
training the predictor on the de-anonymized portion of the test set, and
combining probabilistic predictions from de-anonymization and link prediction.Comment: 11 pages, 13 figures; submitted to IJCNN'201
Privacy Preservation by Disassociation
In this work, we focus on protection against identity disclosure in the
publication of sparse multidimensional data. Existing multidimensional
anonymization techniquesa) protect the privacy of users either by altering the
set of quasi-identifiers of the original data (e.g., by generalization or
suppression) or by adding noise (e.g., using differential privacy) and/or (b)
assume a clear distinction between sensitive and non-sensitive information and
sever the possible linkage. In many real world applications the above
techniques are not applicable. For instance, consider web search query logs.
Suppressing or generalizing anonymization methods would remove the most
valuable information in the dataset: the original query terms. Additionally,
web search query logs contain millions of query terms which cannot be
categorized as sensitive or non-sensitive since a term may be sensitive for a
user and non-sensitive for another. Motivated by this observation, we propose
an anonymization technique termed disassociation that preserves the original
terms but hides the fact that two or more different terms appear in the same
record. We protect the users' privacy by disassociating record terms that
participate in identifying combinations. This way the adversary cannot
associate with high probability a record with a rare combination of terms. To
the best of our knowledge, our proposal is the first to employ such a technique
to provide protection against identity disclosure. We propose an anonymization
algorithm based on our approach and evaluate its performance on real and
synthetic datasets, comparing it against other state-of-the-art methods based
on generalization and differential privacy.Comment: VLDB201
- …