25,573 research outputs found
Sharing Social Network Data: Differentially Private Estimation of Exponential-Family Random Graph Models
Motivated by a real-life problem of sharing social network data that contain
sensitive personal information, we propose a novel approach to release and
analyze synthetic graphs in order to protect privacy of individual
relationships captured by the social network while maintaining the validity of
statistical results. A case study using a version of the Enron e-mail corpus
dataset demonstrates the application and usefulness of the proposed techniques
in solving the challenging problem of maintaining privacy \emph{and} supporting
open access to network data to ensure reproducibility of existing studies and
discovering new scientific insights that can be obtained by analyzing such
data. We use a simple yet effective randomized response mechanism to generate
synthetic networks under -edge differential privacy, and then use
likelihood based inference for missing data and Markov chain Monte Carlo
techniques to fit exponential-family random graph models to the generated
synthetic networks.Comment: Updated, 39 page
Avoiding disclosure of individually identifiable health information: a literature review
Achieving data and information dissemination without arming anyone is a central task of any entity in charge of collecting data. In this article, the authors examine the literature on data and statistical confidentiality. Rather than comparing the theoretical properties of specific methods, they emphasize the main themes that emerge from the ongoing discussion among scientists regarding how best to achieve the appropriate balance between data protection, data utility, and data dissemination. They cover the literature on de-identification and reidentification methods with emphasis on health care data. The authors also discuss the benefits and limitations for the most common access methods. Although there is abundant theoretical and empirical research, their review reveals lack of consensus on fundamental questions for empirical practice: How to assess disclosure risk, how to choose among disclosure methods, how to assess reidentification risk, and how to measure utility loss.public use files, disclosure avoidance, reidentification, de-identification, data utility
Inferential Privacy Guarantees for Differentially Private Mechanisms
The correlations and network structure amongst individuals in datasets
today---whether explicitly articulated, or deduced from biological or
behavioral connections---pose new issues around privacy guarantees, because of
inferences that can be made about one individual from another's data. This
motivates quantifying privacy in networked contexts in terms of "inferential
privacy"---which measures the change in beliefs about an individual's data from
the result of a computation---as originally proposed by Dalenius in the 1970's.
Inferential privacy is implied by differential privacy when data are
independent, but can be much worse when data are correlated; indeed, simple
examples, as well as a general impossibility theorem of Dwork and Naor,
preclude the possibility of achieving non-trivial inferential privacy when the
adversary can have arbitrary auxiliary information. In this paper, we ask how
differential privacy guarantees translate to guarantees on inferential privacy
in networked contexts: specifically, under what limitations on the adversary's
information about correlations, modeled as a prior distribution over datasets,
can we deduce an inferential guarantee from a differential one?
We prove two main results. The first result pertains to distributions that
satisfy a natural positive-affiliation condition, and gives an upper bound on
the inferential privacy guarantee for any differentially private mechanism.
This upper bound is matched by a simple mechanism that adds Laplace noise to
the sum of the data. The second result pertains to distributions that have weak
correlations, defined in terms of a suitable "influence matrix". The result
provides an upper bound for inferential privacy in terms of the differential
privacy parameter and the spectral norm of this matrix
Truthful Linear Regression
We consider the problem of fitting a linear model to data held by individuals
who are concerned about their privacy. Incentivizing most players to truthfully
report their data to the analyst constrains our design to mechanisms that
provide a privacy guarantee to the participants; we use differential privacy to
model individuals' privacy losses. This immediately poses a problem, as
differentially private computation of a linear model necessarily produces a
biased estimation, and existing approaches to design mechanisms to elicit data
from privacy-sensitive individuals do not generalize well to biased estimators.
We overcome this challenge through an appropriate design of the computation and
payment scheme.Comment: To appear in Proceedings of the 28th Annual Conference on Learning
Theory (COLT 2015
- …