18 research outputs found

    Copulas as High Dimensional Generative Models: Vine Copula Autoencoders

    Get PDF
    We introduce the vine copula autoencoder (VCAE), a flexible generative model for high-dimensional distributions built in a straightforward three-step procedure. First, an autoencoder (AE) compresses the data into a lower dimensional representation. Second, the multivariate distribution of the encoded data is estimated with vine copulas. Third, a generative model is obtained by combining the estimated distribution with the decoder part of the AE. As such, the proposed approach can transform any already trained AE into a flexible generative model at a low computational cost. This is an advantage over existing generative models such as adversarial networks and variational AEs which can be difficult to train and can impose strong assumptions on the latent space. Experiments on MNIST, Street View House Numbers and Large-Scale CelebFaces Attributes datasets show that VCAEs can achieve competitive results to standard baselines

    M2^2M: A general method to perform various data analysis tasks from a differentially private sketch

    Full text link
    Differential privacy is the standard privacy definition for performing analyses over sensitive data. Yet, its privacy budget bounds the number of tasks an analyst can perform with reasonable accuracy, which makes it challenging to deploy in practice. This can be alleviated by private sketching, where the dataset is compressed into a single noisy sketch vector which can be shared with the analysts and used to perform arbitrarily many analyses. However, the algorithms to perform specific tasks from sketches must be developed on a case-by-case basis, which is a major impediment to their use. In this paper, we introduce the generic moment-to-moment (M2^2M) method to perform a wide range of data exploration tasks from a single private sketch. Among other things, this method can be used to estimate empirical moments of attributes, the covariance matrix, counting queries (including histograms), and regression models. Our method treats the sketching mechanism as a black-box operation, and can thus be applied to a wide variety of sketches from the literature, widening their ranges of applications without further engineering or privacy loss, and removing some of the technical barriers to the wider adoption of sketches for data exploration under differential privacy. We validate our method with data exploration tasks on artificial and real-world data, and show that it can be used to reliably estimate statistics and train classification models from private sketches.Comment: Published at the 18th International Workshop on Security and Trust Management (STM 2022

    Quantifying Differential Privacy under Temporal Correlations

    Full text link
    Differential Privacy (DP) has received increased attention as a rigorous privacy framework. Existing studies employ traditional DP mechanisms (e.g., the Laplace mechanism) as primitives, which assume that the data are independent, or that adversaries do not have knowledge of the data correlations. However, continuously generated data in the real world tend to be temporally correlated, and such correlations can be acquired by adversaries. In this paper, we investigate the potential privacy loss of a traditional DP mechanism under temporal correlations in the context of continuous data release. First, we model the temporal correlations using Markov model and analyze the privacy leakage of a DP mechanism when adversaries have knowledge of such temporal correlations. Our analysis reveals that the privacy leakage of a DP mechanism may accumulate and increase over time. We call it temporal privacy leakage. Second, to measure such privacy leakage, we design an efficient algorithm for calculating it in polynomial time. Although the temporal privacy leakage may increase over time, we also show that its supremum may exist in some cases. Third, to bound the privacy loss, we propose mechanisms that convert any existing DP mechanism into one against temporal privacy leakage. Experiments with synthetic data confirm that our approach is efficient and effective.Comment: appears at ICDE 201

    Releasing survey microdata with exact cluster locations and additional privacy safeguards

    Get PDF
    Household survey programs around the world publish fine-granular georeferenced microdata to support research on the interdependence of human livelihoods and their surrounding environment. To safeguard the respondents’ privacy, micro-level survey data is usually (pseudo)-anonymized through deletion or perturbation procedures such as obfuscating the true location of data collection. This, however, poses a challenge to emerging approaches that augment survey data with auxiliary information on a local level. Here, we propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards through synthetically generated data using generative models. We back our proposal with experiments using data from the 2011 Costa Rican census and satellite-derived auxiliary information. Our strategy reduces the respondents’ re-identification risk for any number of disclosed attributes by 60–80% even under re-identification attempts

    Protecting Locations with Differential Privacy under Temporal Correlations

    Full text link
    Concerns on location privacy frequently arise with the rapid development of GPS enabled devices and location-based applications. While spatial transformation techniques such as location perturbation or generalization have been studied extensively, most techniques rely on syntactic privacy models without rigorous privacy guarantee. Many of them only consider static scenarios or perturb the location at single timestamps without considering temporal correlations of a moving user's locations, and hence are vulnerable to various inference attacks. While differential privacy has been accepted as a standard for privacy protection, applying differential privacy in location based applications presents new challenges, as the protection needs to be enforced on the fly for a single user and needs to incorporate temporal correlations between a user's locations. In this paper, we propose a systematic solution to preserve location privacy with rigorous privacy guarantee. First, we propose a new definition, "δ\delta-location set" based differential privacy, to account for the temporal correlations in location data. Second, we show that the well known 1\ell_1-norm sensitivity fails to capture the geometric sensitivity in multidimensional space and propose a new notion, sensitivity hull, based on which the error of differential privacy is bounded. Third, to obtain the optimal utility we present a planar isotropic mechanism (PIM) for location perturbation, which is the first mechanism achieving the lower bound of differential privacy. Experiments on real-world datasets also demonstrate that PIM significantly outperforms baseline approaches in data utility.Comment: Final version Nov-04-201
    corecore