18 research outputs found
Copulas as High Dimensional Generative Models: Vine Copula Autoencoders
We introduce the vine copula autoencoder (VCAE), a flexible generative model
for high-dimensional distributions built in a straightforward three-step procedure.
First, an autoencoder (AE) compresses the data into a lower dimensional representation. Second, the multivariate distribution of the encoded data is estimated with
vine copulas. Third, a generative model is obtained by combining the estimated
distribution with the decoder part of the AE. As such, the proposed approach
can transform any already trained AE into a flexible generative model at a low
computational cost. This is an advantage over existing generative models such as
adversarial networks and variational AEs which can be difficult to train and can
impose strong assumptions on the latent space. Experiments on MNIST, Street
View House Numbers and Large-Scale CelebFaces Attributes datasets show that
VCAEs can achieve competitive results to standard baselines
MM: A general method to perform various data analysis tasks from a differentially private sketch
Differential privacy is the standard privacy definition for performing
analyses over sensitive data. Yet, its privacy budget bounds the number of
tasks an analyst can perform with reasonable accuracy, which makes it
challenging to deploy in practice. This can be alleviated by private sketching,
where the dataset is compressed into a single noisy sketch vector which can be
shared with the analysts and used to perform arbitrarily many analyses.
However, the algorithms to perform specific tasks from sketches must be
developed on a case-by-case basis, which is a major impediment to their use. In
this paper, we introduce the generic moment-to-moment (MM) method to
perform a wide range of data exploration tasks from a single private sketch.
Among other things, this method can be used to estimate empirical moments of
attributes, the covariance matrix, counting queries (including histograms), and
regression models. Our method treats the sketching mechanism as a black-box
operation, and can thus be applied to a wide variety of sketches from the
literature, widening their ranges of applications without further engineering
or privacy loss, and removing some of the technical barriers to the wider
adoption of sketches for data exploration under differential privacy. We
validate our method with data exploration tasks on artificial and real-world
data, and show that it can be used to reliably estimate statistics and train
classification models from private sketches.Comment: Published at the 18th International Workshop on Security and Trust
Management (STM 2022
Quantifying Differential Privacy under Temporal Correlations
Differential Privacy (DP) has received increased attention as a rigorous
privacy framework. Existing studies employ traditional DP mechanisms (e.g., the
Laplace mechanism) as primitives, which assume that the data are independent,
or that adversaries do not have knowledge of the data correlations. However,
continuously generated data in the real world tend to be temporally correlated,
and such correlations can be acquired by adversaries. In this paper, we
investigate the potential privacy loss of a traditional DP mechanism under
temporal correlations in the context of continuous data release. First, we
model the temporal correlations using Markov model and analyze the privacy
leakage of a DP mechanism when adversaries have knowledge of such temporal
correlations. Our analysis reveals that the privacy leakage of a DP mechanism
may accumulate and increase over time. We call it temporal privacy leakage.
Second, to measure such privacy leakage, we design an efficient algorithm for
calculating it in polynomial time. Although the temporal privacy leakage may
increase over time, we also show that its supremum may exist in some cases.
Third, to bound the privacy loss, we propose mechanisms that convert any
existing DP mechanism into one against temporal privacy leakage. Experiments
with synthetic data confirm that our approach is efficient and effective.Comment: appears at ICDE 201
Releasing survey microdata with exact cluster locations and additional privacy safeguards
Household survey programs around the world publish fine-granular georeferenced microdata to support research on the interdependence of human livelihoods and their surrounding environment. To safeguard the respondents’ privacy, micro-level survey data is usually (pseudo)-anonymized through deletion or perturbation procedures such as obfuscating the true location of data collection. This, however, poses a challenge to emerging approaches that augment survey data with auxiliary information on a local level. Here, we propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards through synthetically generated data using generative models. We back our proposal with experiments using data from the 2011 Costa Rican census and satellite-derived auxiliary information. Our strategy reduces the respondents’ re-identification risk for any number of disclosed attributes by 60–80% even under re-identification attempts
Protecting Locations with Differential Privacy under Temporal Correlations
Concerns on location privacy frequently arise with the rapid development of
GPS enabled devices and location-based applications. While spatial
transformation techniques such as location perturbation or generalization have
been studied extensively, most techniques rely on syntactic privacy models
without rigorous privacy guarantee. Many of them only consider static scenarios
or perturb the location at single timestamps without considering temporal
correlations of a moving user's locations, and hence are vulnerable to various
inference attacks. While differential privacy has been accepted as a standard
for privacy protection, applying differential privacy in location based
applications presents new challenges, as the protection needs to be enforced on
the fly for a single user and needs to incorporate temporal correlations
between a user's locations.
In this paper, we propose a systematic solution to preserve location privacy
with rigorous privacy guarantee. First, we propose a new definition,
"-location set" based differential privacy, to account for the temporal
correlations in location data. Second, we show that the well known
-norm sensitivity fails to capture the geometric sensitivity in
multidimensional space and propose a new notion, sensitivity hull, based on
which the error of differential privacy is bounded. Third, to obtain the
optimal utility we present a planar isotropic mechanism (PIM) for location
perturbation, which is the first mechanism achieving the lower bound of
differential privacy. Experiments on real-world datasets also demonstrate that
PIM significantly outperforms baseline approaches in data utility.Comment: Final version Nov-04-201