25 research outputs found

    Information Extraction Under Privacy Constraints

    Full text link
    A privacy-constrained information extraction problem is considered where for a pair of correlated discrete random variables (X,Y)(X,Y) governed by a given joint distribution, an agent observes YY and wants to convey to a potentially public user as much information about YY as possible without compromising the amount of information revealed about XX. To this end, the so-called {\em rate-privacy function} is introduced to quantify the maximal amount of information (measured in terms of mutual information) that can be extracted from YY under a privacy constraint between XX and the extracted information, where privacy is measured using either mutual information or maximal correlation. Properties of the rate-privacy function are analyzed and information-theoretic and estimation-theoretic interpretations of it are presented for both the mutual information and maximal correlation privacy measures. It is also shown that the rate-privacy function admits a closed-form expression for a large family of joint distributions of (X,Y)(X,Y). Finally, the rate-privacy function under the mutual information privacy measure is considered for the case where (X,Y)(X,Y) has a joint probability density function by studying the problem where the extracted information is a uniform quantization of YY corrupted by additive Gaussian noise. The asymptotic behavior of the rate-privacy function is studied as the quantization resolution grows without bound and it is observed that not all of the properties of the rate-privacy function carry over from the discrete to the continuous case.Comment: 55 pages, 6 figures. Improved the organization and added detailed literature revie

    Privacy-Aware Guessing Efficiency

    Full text link
    We investigate the problem of guessing a discrete random variable YY under a privacy constraint dictated by another correlated discrete random variable XX, where both guessing efficiency and privacy are assessed in terms of the probability of correct guessing. We define h(PXY,ϵ)h(P_{XY}, \epsilon) as the maximum probability of correctly guessing YY given an auxiliary random variable ZZ, where the maximization is taken over all PZ∣YP_{Z|Y} ensuring that the probability of correctly guessing XX given ZZ does not exceed ϵ\epsilon. We show that the map ϵ↦h(PXY,ϵ)\epsilon\mapsto h(P_{XY}, \epsilon) is strictly increasing, concave, and piecewise linear, which allows us to derive a closed form expression for h(PXY,ϵ)h(P_{XY}, \epsilon) when XX and YY are connected via a binary-input binary-output channel. For (Xn,Yn)(X^n, Y^n) being pairs of independent and identically distributed binary random vectors, we similarly define h‾n(PXnYn,ϵ)\underline{h}_n(P_{X^nY^n}, \epsilon) under the assumption that ZnZ^n is also a binary vector. Then we obtain a closed form expression for h‾n(PXnYn,ϵ)\underline{h}_n(P_{X^nY^n}, \epsilon) for sufficiently large, but nontrivial values of ϵ\epsilon.Comment: ISIT 201

    Privacy-Aware MMSE Estimation

    Full text link
    We investigate the problem of the predictability of random variable YY under a privacy constraint dictated by random variable XX, correlated with YY, where both predictability and privacy are assessed in terms of the minimum mean-squared error (MMSE). Given that XX and YY are connected via a binary-input symmetric-output (BISO) channel, we derive the \emph{optimal} random mapping PZ∣YP_{Z|Y} such that the MMSE of YY given ZZ is minimized while the MMSE of XX given ZZ is greater than (1−ϵ)var(X)(1-\epsilon)\mathsf{var}(X) for a given ϵ≥0\epsilon\geq 0. We also consider the case where (X,Y)(X,Y) are continuous and PZ∣YP_{Z|Y} is restricted to be an additive noise channel.Comment: 9 pages, 3 figure

    Almost Perfect Privacy for Additive Gaussian Privacy Filters

    Full text link
    We study the maximal mutual information about a random variable YY (representing non-private information) displayed through an additive Gaussian channel when guaranteeing that only ϵ\epsilon bits of information is leaked about a random variable XX (representing private information) that is correlated with YY. Denoting this quantity by gϵ(X,Y)g_\epsilon(X,Y), we show that for perfect privacy, i.e., ϵ=0\epsilon=0, one has g0(X,Y)=0g_0(X,Y)=0 for any pair of absolutely continuous random variables (X,Y)(X,Y) and then derive a second-order approximation for gϵ(X,Y)g_\epsilon(X,Y) for small ϵ\epsilon. This approximation is shown to be related to the strong data processing inequality for mutual information under suitable conditions on the joint distribution PXYP_{XY}. Next, motivated by an operational interpretation of data privacy, we formulate the privacy-utility tradeoff in the same setup using estimation-theoretic quantities and obtain explicit bounds for this tradeoff when ϵ\epsilon is sufficiently small using the approximation formula derived for gϵ(X,Y)g_\epsilon(X,Y).Comment: 20 pages. To appear in Springer-Verla

    Context-Aware Generative Adversarial Privacy

    Full text link
    Preserving the utility of published datasets while simultaneously providing provable privacy guarantees is a well-known challenge. On the one hand, context-free privacy solutions, such as differential privacy, provide strong privacy guarantees, but often lead to a significant reduction in utility. On the other hand, context-aware privacy solutions, such as information theoretic privacy, achieve an improved privacy-utility tradeoff, but assume that the data holder has access to dataset statistics. We circumvent these limitations by introducing a novel context-aware privacy framework called generative adversarial privacy (GAP). GAP leverages recent advancements in generative adversarial networks (GANs) to allow the data holder to learn privatization schemes from the dataset itself. Under GAP, learning the privacy mechanism is formulated as a constrained minimax game between two players: a privatizer that sanitizes the dataset in a way that limits the risk of inference attacks on the individuals' private variables, and an adversary that tries to infer the private variables from the sanitized dataset. To evaluate GAP's performance, we investigate two simple (yet canonical) statistical dataset models: (a) the binary data model, and (b) the binary Gaussian mixture model. For both models, we derive game-theoretically optimal minimax privacy mechanisms, and show that the privacy mechanisms learned from data (in a generative adversarial fashion) match the theoretically optimal ones. This demonstrates that our framework can be easily applied in practice, even in the absence of dataset statistics.Comment: Improved version of a paper accepted by Entropy Journal, Special Issue on Information Theory in Machine Learning and Data Scienc
    corecore