9,280 research outputs found
Differential Privacy for Edge Weights in Social Networks
Social networks can be analyzed to discover important social issues; however, it will cause privacy disclosure in the process. The edge weights play an important role in social graphs, which are associated with sensitive information (e.g., the price of commercial trade). In the paper, we propose the MB-CI (Merging Barrels and Consistency Inference) strategy to protect weighted social graphs. By viewing the edge-weight sequence as an unattributed histogram, differential privacy for edge weights can be implemented based on the histogram. Considering that some edges have the same weight in a social network, we merge the barrels with the same count into one group to reduce the noise required. Moreover, k-indistinguishability between groups is proposed to fulfill differential privacy not to be violated, because simple merging operation may disclose some information by the magnitude of noise itself. For keeping most of the shortest paths unchanged, we do consistency inference according to original order of the sequence as an important postprocessing step. Experimental results show that the proposed approach effectively improved the accuracy and utility of the released data
Differentially Private Exponential Random Graphs
We propose methods to release and analyze synthetic graphs in order to
protect privacy of individual relationships captured by the social network.
Proposed techniques aim at fitting and estimating a wide class of exponential
random graph models (ERGMs) in a differentially private manner, and thus offer
rigorous privacy guarantees. More specifically, we use the randomized response
mechanism to release networks under -edge differential privacy. To
maintain utility for statistical inference, treating the original graph as
missing, we propose a way to use likelihood based inference and Markov chain
Monte Carlo (MCMC) techniques to fit ERGMs to the produced synthetic networks.
We demonstrate the usefulness of the proposed techniques on a real data
example.Comment: minor edit
Link Prediction by De-anonymization: How We Won the Kaggle Social Network Challenge
This paper describes the winning entry to the IJCNN 2011 Social Network
Challenge run by Kaggle.com. The goal of the contest was to promote research on
real-world link prediction, and the dataset was a graph obtained by crawling
the popular Flickr social photo sharing website, with user identities scrubbed.
By de-anonymizing much of the competition test set using our own Flickr crawl,
we were able to effectively game the competition. Our attack represents a new
application of de-anonymization to gaming machine learning contests, suggesting
changes in how future competitions should be run.
We introduce a new simulated annealing-based weighted graph matching
algorithm for the seeding step of de-anonymization. We also show how to combine
de-anonymization with link prediction---the latter is required to achieve good
performance on the portion of the test set not de-anonymized---for example by
training the predictor on the de-anonymized portion of the test set, and
combining probabilistic predictions from de-anonymization and link prediction.Comment: 11 pages, 13 figures; submitted to IJCNN'201
Differentially Private Data Analysis of Social Networks via Restricted Sensitivity
We introduce the notion of restricted sensitivity as an alternative to global
and smooth sensitivity to improve accuracy in differentially private data
analysis. The definition of restricted sensitivity is similar to that of global
sensitivity except that instead of quantifying over all possible datasets, we
take advantage of any beliefs about the dataset that a querier may have, to
quantify over a restricted class of datasets. Specifically, given a query f and
a hypothesis H about the structure of a dataset D, we show generically how to
transform f into a new query f_H whose global sensitivity (over all datasets
including those that do not satisfy H) matches the restricted sensitivity of
the query f. Moreover, if the belief of the querier is correct (i.e., D is in
H) then f_H(D) = f(D). If the belief is incorrect, then f_H(D) may be
inaccurate.
We demonstrate the usefulness of this notion by considering the task of
answering queries regarding social-networks, which we model as a combination of
a graph and a labeling of its vertices. In particular, while our generic
procedure is computationally inefficient, for the specific definition of H as
graphs of bounded degree, we exhibit efficient ways of constructing f_H using
different projection-based techniques. We then analyze two important query
classes: subgraph counting queries (e.g., number of triangles) and local
profile queries (e.g., number of people who know a spy and a computer-scientist
who know each other). We demonstrate that the restricted sensitivity of such
queries can be significantly lower than their smooth sensitivity. Thus, using
restricted sensitivity we can maintain privacy whether or not D is in H, while
providing more accurate results in the event that H holds true
Private Graphon Estimation for Sparse Graphs
We design algorithms for fitting a high-dimensional statistical model to a
large, sparse network without revealing sensitive information of individual
members. Given a sparse input graph , our algorithms output a
node-differentially-private nonparametric block model approximation. By
node-differentially-private, we mean that our output hides the insertion or
removal of a vertex and all its adjacent edges. If is an instance of the
network obtained from a generative nonparametric model defined in terms of a
graphon , our model guarantees consistency, in the sense that as the number
of vertices tends to infinity, the output of our algorithm converges to in
an appropriate version of the norm. In particular, this means we can
estimate the sizes of all multi-way cuts in .
Our results hold as long as is bounded, the average degree of grows
at least like the log of the number of vertices, and the number of blocks goes
to infinity at an appropriate rate. We give explicit error bounds in terms of
the parameters of the model; in several settings, our bounds improve on or
match known nonprivate results.Comment: 36 page
- …