41 research outputs found

    Strong Data Processing Inequalities for Input Constrained Additive Noise Channels

    Full text link
    This paper quantifies the intuitive observation that adding noise reduces available information by means of non-linear strong data processing inequalities. Consider the random variables W→X→YW\to X\to Y forming a Markov chain, where Y=X+ZY=X+Z with XX and ZZ real-valued, independent and XX bounded in LpL_p-norm. It is shown that I(W;Y)≤FI(I(W;X))I(W;Y) \le F_I(I(W;X)) with FI(t)<tF_I(t)<t whenever t>0t>0, if and only if ZZ has a density whose support is not disjoint from any translate of itself. A related question is to characterize for what couplings (W,X)(W,X) the mutual information I(W;Y)I(W;Y) is close to maximum possible. To that end we show that in order to saturate the channel, i.e. for I(W;Y)I(W;Y) to approach capacity, it is mandatory that I(W;X)→∞I(W;X)\to\infty (under suitable conditions on the channel). A key ingredient for this result is a deconvolution lemma which shows that post-convolution total variation distance bounds the pre-convolution Kolmogorov-Smirnov distance. Explicit bounds are provided for the special case of the additive Gaussian noise channel with quadratic cost constraint. These bounds are shown to be order-optimal. For this case simplified proofs are provided leveraging Gaussian-specific tools such as the connection between information and estimation (I-MMSE) and Talagrand's information-transportation inequality

    Correspondence Analysis Using Neural Networks

    Full text link
    Correspondence analysis (CA) is a multivariate statistical tool used to visualize and interpret data dependencies. CA has found applications in fields ranging from epidemiology to social sciences. However, current methods used to perform CA do not scale to large, high-dimensional datasets. By re-interpreting the objective in CA using an information-theoretic tool called the principal inertia components, we demonstrate that performing CA is equivalent to solving a functional optimization problem over the space of finite variance functions of two random variable. We show that this optimization problem, in turn, can be efficiently approximated by neural networks. The resulting formulation, called the correspondence analysis neural network (CA-NN), enables CA to be performed at an unprecedented scale. We validate the CA-NN on synthetic data, and demonstrate how it can be used to perform CA on a variety of datasets, including food recipes, wine compositions, and images. Our results outperform traditional methods used in CA, indicating that CA-NN can serve as a new, scalable tool for interpretability and visualization of complex dependencies between random variables.Comment: Accepted to AISTATS 2019. Overlaps with arXiv:1806.0844

    On the Direction of Discrimination: An Information-Theoretic Analysis of Disparate Impact in Machine Learning

    Full text link
    In the context of machine learning, disparate impact refers to a form of systematic discrimination whereby the output distribution of a model depends on the value of a sensitive attribute (e.g., race or gender). In this paper, we propose an information-theoretic framework to analyze the disparate impact of a binary classification model. We view the model as a fixed channel, and quantify disparate impact as the divergence in output distributions over two groups. Our aim is to find a correction function that can perturb the input distributions of each group to align their output distributions. We present an optimization problem that can be solved to obtain a correction function that will make the output distributions statistically indistinguishable. We derive closed-form expressions to efficiently compute the correction function, and demonstrate the benefits of our framework on a recidivism prediction problem based on the ProPublica COMPAS dataset

    A Tunable Measure for Information Leakage

    Full text link
    A tunable measure for information leakage called \textit{maximal α\alpha-leakage} is introduced. This measure quantifies the maximal gain of an adversary in refining a tilted version of its prior belief of any (potentially random) function of a dataset conditioning on a disclosed dataset. The choice of α\alpha determines the specific adversarial action ranging from refining a belief for α=1\alpha =1 to guessing the best posterior for α=∞\alpha = \infty, and for these extremal values this measure simplifies to mutual information (MI) and maximal leakage (MaxL), respectively. For all other α\alpha this measure is shown to be the Arimoto channel capacity. Several properties of this measure are proven including: (i) quasi-convexity in the mapping between the original and disclosed datasets; (ii) data processing inequalities; and (iii) a composition property.Comment: 7 pages. This paper is the extended version of the conference paper "A Tunable Measure for Information Leakage" accepted by ISIT 201

    On the Robustness of Information-Theoretic Privacy Measures and Mechanisms

    Full text link
    Consider a data publishing setting for a dataset composed by both private and non-private features. The publisher uses an empirical distribution, estimated from nn i.i.d. samples, to design a privacy mechanism which is applied to new fresh samples afterward. In this paper, we study the discrepancy between the privacy-utility guarantees for the empirical distribution, used to design the privacy mechanism, and those for the true distribution, experienced by the privacy mechanism in practice. We first show that, for any privacy mechanism, these discrepancies vanish at speed O(1/n)O(1/\sqrt{n}) with high probability. These bounds follow from our main technical results regarding the Lipschitz continuity of the considered information leakage measures. Then we prove that the optimal privacy mechanisms for the empirical distribution approach the corresponding mechanisms for the true distribution as the sample size nn increases, thereby establishing the statistical consistency of the optimal privacy mechanisms. Finally, we introduce and study uniform privacy mechanisms which, by construction, provide privacy to all the distributions within a neighborhood of the estimated distribution and, thereby, guarantee privacy for the true distribution with high probability

    Privacy Under Hard Distortion Constraints

    Full text link
    We study the problem of data disclosure with privacy guarantees, wherein the utility of the disclosed data is ensured via a \emph{hard distortion} constraint. Unlike average distortion, hard distortion provides a deterministic guarantee of fidelity. For the privacy measure, we use a tunable information leakage measure, namely \textit{maximal α\alpha-leakage} (α∈[1,∞]\alpha\in[1,\infty]), and formulate the privacy-utility tradeoff problem. The resulting solution highlights that under a hard distortion constraint, the nature of the solution remains unchanged for both local and non-local privacy requirements. More precisely, we show that both the optimal mechanism and the optimal tradeoff are invariant for any α>1\alpha>1; i.e., the tunable leakage measure only behaves as either of the two extrema, i.e., mutual information for α=1\alpha=1 and maximal leakage for α=∞\alpha=\infty.Comment: 5 pages, 1 figur

    Robustness of Maximal α\alpha-Leakage to Side Information

    Full text link
    Maximal α\alpha-leakage is a tunable measure of information leakage based on the accuracy of guessing an arbitrary function of private data based on public data. The parameter α\alpha determines the loss function used to measure the accuracy of a belief, ranging from log-loss at α=1\alpha=1 to the probability of error at α=∞\alpha=\infty. To study the effect of side information on this measure, we introduce and define conditional maximal α\alpha-leakage. We show that, for a chosen mapping (channel) from the actual (viewed as private) data to the released (public) data and some side information, the conditional maximal α\alpha-leakage is the supremum (over all side information) of the conditional Arimoto channel capacity where the conditioning is on the side information. We prove that if the side information is conditionally independent of the public data given the private data, the side information cannot increase the information leakage.Comment: This paper has been accepted by ISIT 201

    Hypothesis Testing under Mutual Information Privacy Constraints in the High Privacy Regime

    Full text link
    Hypothesis testing is a statistical inference framework for determining the true distribution among a set of possible distributions for a given dataset. Privacy restrictions may require the curator of the data or the respondents themselves to share data with the test only after applying a randomizing privacy mechanism. This work considers mutual information (MI) as the privacy metric for measuring leakage. In addition, motivated by the Chernoff-Stein lemma, the relative entropy between pairs of distributions of the output (generated by the privacy mechanism) is chosen as the utility metric. For these metrics, the goal is to find the optimal privacy-utility trade-off (PUT) and the corresponding optimal privacy mechanism for both binary and m-ary hypothesis testing. Focusing on the high privacy regime, Euclidean information-theoretic approximations of the binary and m-ary PUT problems are developed. The solutions for the approximation problems clarify that an MI-based privacy metric preserves the privacy of the source symbols in inverse proportion to their likelihoods.Comment: 13 pages, 7 figures. The paper is submitted to "Transactions on Information Forensics & Security". Comparing to the paper arXiv:1607.00533 "Hypothesis Testing in the High Privacy Limit", the overlapping content is results for binary hypothesis testing with a zero error exponent, and the extended contents are the results for both m-ary hypothesis testing and binary hypothesis testing with nonzero error exponent

    Optimized Data Pre-Processing for Discrimination Prevention

    Full text link
    Non-discrimination is a recognized objective in algorithmic decision making. In this paper, we introduce a novel probabilistic formulation of data pre-processing for reducing discrimination. We propose a convex optimization for learning a data transformation with three goals: controlling discrimination, limiting distortion in individual data samples, and preserving utility. We characterize the impact of limited sample size in accomplishing this objective, and apply two instances of the proposed optimization to datasets, including one on real-world criminal recidivism. The results demonstrate that all three criteria can be simultaneously achieved and also reveal interesting patterns of bias in American society

    Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions

    Full text link
    When the performance of a machine learning model varies over groups defined by sensitive attributes (e.g., gender or ethnicity), the performance disparity can be expressed in terms of the probability distributions of the input and output variables over each group. In this paper, we exploit this fact to reduce the disparate impact of a fixed classification model over a population of interest. Given a black-box classifier, we aim to eliminate the performance gap by perturbing the distribution of input variables for the disadvantaged group. We refer to the perturbed distribution as a counterfactual distribution, and characterize its properties for common fairness criteria. We introduce a descent algorithm to learn a counterfactual distribution from data. We then discuss how the estimated distribution can be used to build a data preprocessor that can reduce disparate impact without training a new model. We validate our approach through experiments on real-world datasets, showing that it can repair different forms of disparity without a significant drop in accuracy
    corecore