136 research outputs found
Differentially Private Model Selection with Penalized and Constrained Likelihood
In statistical disclosure control, the goal of data analysis is twofold: The
released information must provide accurate and useful statistics about the
underlying population of interest, while minimizing the potential for an
individual record to be identified. In recent years, the notion of differential
privacy has received much attention in theoretical computer science, machine
learning, and statistics. It provides a rigorous and strong notion of
protection for individuals' sensitive information. A fundamental question is
how to incorporate differential privacy into traditional statistical inference
procedures. In this paper we study model selection in multivariate linear
regression under the constraint of differential privacy. We show that model
selection procedures based on penalized least squares or likelihood can be made
differentially private by a combination of regularization and randomization,
and propose two algorithms to do so. We show that our private procedures are
consistent under essentially the same conditions as the corresponding
non-private procedures. We also find that under differential privacy, the
procedure becomes more sensitive to the tuning parameters. We illustrate and
evaluate our method using simulation studies and two real data examples
Exclusive Strategy for Generalization Algorithms in Micro-data Disclosure
Abstract. When generalization algorithms are known to the public, an adver-sary can obtain a more precise estimation of the secret table than what can be deduced from the disclosed generalization result. Therefore, whether a general-ization algorithm can satisfy a privacy property should be judged based on such an estimation. In this paper, we show that the computation of the estimation is inherently a recursive process that exhibits a high complexity when generaliza-tion algorithms take a straightforward inclusive strategy. To facilitate the design of more efficient generalization algorithms, we suggest an alternative exclusive strategy, which adopts a seemingly drastic approach to eliminate the need for recursion. Surprisingly, the data utility of the two strategies are actually not com-parable and the exclusive strategy can provide better data utility in certain cases.
Can a supernova be located by its neutrinos?
A future core-collapse supernova in our Galaxy will be detected by several
neutrino detectors around the world. The neutrinos escape from the supernova
core over several seconds from the time of collapse, unlike the electromagnetic
radiation, emitted from the envelope, which is delayed by a time of order
hours. In addition, the electromagnetic radiation can be obscured by dust in
the intervening interstellar space. The question therefore arises whether a
supernova can be located by its neutrinos alone. The early warning of a
supernova and its location might allow greatly improved astronomical
observations. The theme of the present work is a careful and realistic
assessment of this question, taking into account the statistical significance
of the various neutrino signals. Not surprisingly, neutrino-electron forward
scattering leads to a good determination of the supernova direction, even in
the presence of the large and nearly isotropic background from other reactions.
Even with the most pessimistic background assumptions, SuperKamiokande (SK) and
the Sudbury Neutrino Observatory (SNO) can restrict the supernova direction to
be within circles of radius and , respectively. Other
reactions with more events but weaker angular dependence are much less useful
for locating the supernova. Finally, there is the oft-discussed possibility of
triangulation, i.e., determination of the supernova direction based on an
arrival time delay between different detectors. Given the expected statistics
we show that, contrary to previous estimates, this technique does not allow a
good determination of the supernova direction.Comment: 11 pages including 2 figures. Revised version corrects typos, adds
some brief comment
Recommended from our members
Anonymisation of geographical distance matrices via Lipschitz embedding
BACKGROUND: Anonymisation of spatially referenced data has received increasing attention in recent years. Whereas the research focus has been on the anonymisation of point locations, the disclosure risk arising from the publishing of inter-point distances and corresponding anonymisation methods have not been studied systematically.
METHODS: We propose a new anonymisation method for the release of geographical distances between records of a microdata file-for example patients in a medical database. We discuss a data release scheme in which microdata without coordinates and an additional distance matrix between the corresponding rows of the microdata set are released. In contrast to most other approaches this method preserves small distances better than larger distances. The distances are modified by a variant of Lipschitz embedding.
RESULTS: The effects of the embedding parameters on the risk of data disclosure are evaluated by linkage experiments using simulated data. The results indicate small disclosure risks for appropriate embedding parameters.
CONCLUSION: The proposed method is useful if published distance information might be misused for the re-identification of records. The method can be used for publishing scientific-use-files and as an additional tool for record-linkage studies
Privacy in Microdata Release: Challenges, Techniques, and Approaches
Releasing and disseminating useful microdata while ensuring that no personal or sensitive information is improperly exposed is a complex problem, heavily investigated by the scientific community in the past couple of decades. Various microdata protection approaches have then been proposed, achieving different privacy requirements through appropriate protection techniques. This chapter discusses the privacy risks that can arise in microdata release and illustrates some well-known privacy-preserving techniques and approaches
- …