5,085 research outputs found
Anonymizing Social Graphs via Uncertainty Semantics
Rather than anonymizing social graphs by generalizing them to super
nodes/edges or adding/removing nodes and edges to satisfy given privacy
parameters, recent methods exploit the semantics of uncertain graphs to achieve
privacy protection of participating entities and their relationship. These
techniques anonymize a deterministic graph by converting it into an uncertain
form. In this paper, we propose a generalized obfuscation model based on
uncertain adjacency matrices that keep expected node degrees equal to those in
the unanonymized graph. We analyze two recently proposed schemes and show their
fitting into the model. We also point out disadvantages in each method and
present several elegant techniques to fill the gap between them. Finally, to
support fair comparisons, we develop a new tradeoff quantifying framework by
leveraging the concept of incorrectness in location privacy research.
Experiments on large social graphs demonstrate the effectiveness of our
schemes
Using Metrics Suites to Improve the Measurement of Privacy in Graphs
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Social graphs are widely used in research (e.g., epidemiology) and business (e.g., recommender systems). However, sharing these graphs poses privacy risks because they contain sensitive information about individuals. Graph anonymization techniques aim to protect individual users in a graph, while graph de-anonymization aims to re-identify users. The effectiveness of anonymization and de-anonymization algorithms is usually evaluated with privacy metrics. However, it is unclear how strong existing privacy metrics are when they are used in graph privacy. In this paper, we study 26 privacy metrics for graph anonymization and de-anonymization and evaluate their strength in terms of three criteria: monotonicity indicates whether the metric indicates lower privacy for stronger adversaries; for within-scenario comparisons, evenness indicates whether metric values are spread evenly; and for between-scenario comparisons, shared value range indicates whether metrics use a consistent value range across scenarios. Our extensive experiments indicate that no single metric fulfills all three criteria perfectly. We therefore use methods from multi-criteria decision analysis to aggregate multiple metrics in a metrics suite, and we show that these metrics suites improve monotonicity compared to the best individual metric. This important result enables more monotonic, and thus more accurate, evaluations of new graph anonymization and de-anonymization algorithms
Models and Algorithms for Graph Watermarking
We introduce models and algorithmic foundations for graph watermarking. Our
frameworks include security definitions and proofs, as well as
characterizations when graph watermarking is algorithmically feasible, in spite
of the fact that the general problem is NP-complete by simple reductions from
the subgraph isomorphism or graph edit distance problems. In the digital
watermarking of many types of files, an implicit step in the recovery of a
watermark is the mapping of individual pieces of data, such as image pixels or
movie frames, from one object to another. In graphs, this step corresponds to
approximately matching vertices of one graph to another based on graph
invariants such as vertex degree. Our approach is based on characterizing the
feasibility of graph watermarking in terms of keygen, marking, and
identification functions defined over graph families with known distributions.
We demonstrate the strength of this approach with exemplary watermarking
schemes for two random graph models, the classic Erd\H{o}s-R\'{e}nyi model and
a random power-law graph model, both of which are used to model real-world
networks
Link Prediction by De-anonymization: How We Won the Kaggle Social Network Challenge
This paper describes the winning entry to the IJCNN 2011 Social Network
Challenge run by Kaggle.com. The goal of the contest was to promote research on
real-world link prediction, and the dataset was a graph obtained by crawling
the popular Flickr social photo sharing website, with user identities scrubbed.
By de-anonymizing much of the competition test set using our own Flickr crawl,
we were able to effectively game the competition. Our attack represents a new
application of de-anonymization to gaming machine learning contests, suggesting
changes in how future competitions should be run.
We introduce a new simulated annealing-based weighted graph matching
algorithm for the seeding step of de-anonymization. We also show how to combine
de-anonymization with link prediction---the latter is required to achieve good
performance on the portion of the test set not de-anonymized---for example by
training the predictor on the de-anonymized portion of the test set, and
combining probabilistic predictions from de-anonymization and link prediction.Comment: 11 pages, 13 figures; submitted to IJCNN'201
Differentially Private Data Analysis of Social Networks via Restricted Sensitivity
We introduce the notion of restricted sensitivity as an alternative to global
and smooth sensitivity to improve accuracy in differentially private data
analysis. The definition of restricted sensitivity is similar to that of global
sensitivity except that instead of quantifying over all possible datasets, we
take advantage of any beliefs about the dataset that a querier may have, to
quantify over a restricted class of datasets. Specifically, given a query f and
a hypothesis H about the structure of a dataset D, we show generically how to
transform f into a new query f_H whose global sensitivity (over all datasets
including those that do not satisfy H) matches the restricted sensitivity of
the query f. Moreover, if the belief of the querier is correct (i.e., D is in
H) then f_H(D) = f(D). If the belief is incorrect, then f_H(D) may be
inaccurate.
We demonstrate the usefulness of this notion by considering the task of
answering queries regarding social-networks, which we model as a combination of
a graph and a labeling of its vertices. In particular, while our generic
procedure is computationally inefficient, for the specific definition of H as
graphs of bounded degree, we exhibit efficient ways of constructing f_H using
different projection-based techniques. We then analyze two important query
classes: subgraph counting queries (e.g., number of triangles) and local
profile queries (e.g., number of people who know a spy and a computer-scientist
who know each other). We demonstrate that the restricted sensitivity of such
queries can be significantly lower than their smooth sensitivity. Thus, using
restricted sensitivity we can maintain privacy whether or not D is in H, while
providing more accurate results in the event that H holds true
- …