764 research outputs found

    From the Information Bottleneck to the Privacy Funnel

    Full text link
    We focus on the privacy-utility trade-off encountered by users who wish to disclose some information to an analyst, that is correlated with their private data, in the hope of receiving some utility. We rely on a general privacy statistical inference framework, under which data is transformed before it is disclosed, according to a probabilistic privacy mapping. We show that when the log-loss is introduced in this framework in both the privacy metric and the distortion metric, the privacy leakage and the utility constraint can be reduced to the mutual information between private data and disclosed data, and between non-private data and disclosed data respectively. We justify the relevance and generality of the privacy metric under the log-loss by proving that the inference threat under any bounded cost function can be upper-bounded by an explicit function of the mutual information between private data and disclosed data. We then show that the privacy-utility tradeoff under the log-loss can be cast as the non-convex Privacy Funnel optimization, and we leverage its connection to the Information Bottleneck, to provide a greedy algorithm that is locally optimal. We evaluate its performance on the US census dataset

    Convexity and Operational Interpretation of the Quantum Information Bottleneck Function

    Full text link
    In classical information theory, the information bottleneck method (IBM) can be regarded as a method of lossy data compression which focusses on preserving meaningful (or relevant) information. As such it has recently gained a lot of attention, primarily for its applications in machine learning and neural networks. A quantum analogue of the IBM has recently been defined, and an attempt at providing an operational interpretation of the so-called quantum IB function as an optimal rate of an information-theoretic task, has recently been made by Salek et al. However, the interpretation given in that paper has a couple of drawbacks; firstly its proof is based on a conjecture that the quantum IB function is convex, and secondly, the expression for the rate function involves certain entropic quantities which occur explicitly in the very definition of the underlying information-theoretic task, thus making the latter somewhat contrived. We overcome both of these drawbacks by first proving the convexity of the quantum IB function, and then giving an alternative operational interpretation of it as the optimal rate of a bona fide information-theoretic task, namely that of quantum source coding with quantum side information at the decoder, and relate the quantum IB function to the rate region of this task. We similarly show that the related privacy funnel function is convex (both in the classical and quantum case). However, we comment that it is unlikely that the quantum privacy funnel function can characterize the optimal asymptotic rate of an information theoretic task, since even its classical version lacks a certain additivity property which turns out to be essential.Comment: 17 pages, 7 figures; v2: improved presentation and explanations, one new figure; v3: Restructured manuscript. Theorem 2 has been found previously in work by Hsieh and Watanabe; it is now correctly attribute

    FUNCK: Information Funnels and Bottlenecks for Invariant Representation Learning

    Full text link
    Learning invariant representations that remain useful for a downstream task is still a key challenge in machine learning. We investigate a set of related information funnels and bottleneck problems that claim to learn invariant representations from the data. We also propose a new element to this family of information-theoretic objectives: The Conditional Privacy Funnel with Side Information, which we investigate in fully and semi-supervised settings. Given the generally intractable objectives, we derive tractable approximations using amortized variational inference parameterized by neural networks and study the intrinsic trade-offs of these objectives. We describe empirically the proposed approach and show that with a few labels it is possible to learn fair classifiers and generate useful representations approximately invariant to unwanted sources of variation. Furthermore, we provide insights about the applicability of these methods in real-world scenarios with ordinary tabular datasets when the data is scarce.Comment: 28 page

    Privacy-Constrained Remote Source Coding

    Full text link
    We consider the problem of revealing/sharing data in an efficient and secure way via a compact representation. The representation should ensure reliable reconstruction of the desired features/attributes while still preserve privacy of the secret parts of the data. The problem is formulated as a remote lossy source coding with a privacy constraint where the remote source consists of public and secret parts. Inner and outer bounds for the optimal tradeoff region of compression rate, distortion, and privacy leakage rate are given and shown to coincide for some special cases. When specializing the distortion measure to a logarithmic loss function, the resulting rate-distortion-leakage tradeoff for the case of identical side information forms an optimization problem which corresponds to the "secure" version of the so-called information bottleneck.Comment: 10 pages, 1 figure, to be presented at ISIT 201

    Compressive Privacy for a Linear Dynamical System

    Full text link
    We consider a linear dynamical system in which the state vector consists of both public and private states. One or more sensors make measurements of the state vector and sends information to a fusion center, which performs the final state estimation. To achieve an optimal tradeoff between the utility of estimating the public states and protection of the private states, the measurements at each time step are linearly compressed into a lower dimensional space. Under the centralized setting where all measurements are collected by a single sensor, we propose an optimization problem and an algorithm to find the best compression matrix. Under the decentralized setting where measurements are made separately at multiple sensors, each sensor optimizes its own local compression matrix. We propose methods to separate the overall optimization problem into multiple sub-problems that can be solved locally at each sensor. We consider the cases where there is no message exchange between the sensors; and where each sensor takes turns to transmit messages to the other sensors. Simulations and empirical experiments demonstrate the efficiency of our proposed approach in allowing the fusion center to estimate the public states with good accuracy while preventing it from estimating the private states accurately

    Bottleneck Problems: Information and Estimation-Theoretic View

    Full text link
    Information bottleneck (IB) and privacy funnel (PF) are two closely related optimization problems which have found applications in machine learning, design of privacy algorithms, capacity problems (e.g., Mrs. Gerber's Lemma), strong data processing inequalities, among others. In this work, we first investigate the functional properties of IB and PF through a unified theoretical framework. We then connect them to three information-theoretic coding problems, namely hypothesis testing against independence, noisy source coding and dependence dilution. Leveraging these connections, we prove a new cardinality bound for the auxiliary variable in IB, making its computation more tractable for discrete random variables. In the second part, we introduce a general family of optimization problems, termed as \textit{bottleneck problems}, by replacing mutual information in IB and PF with other notions of mutual information, namely ff-information and Arimoto's mutual information. We then argue that, unlike IB and PF, these problems lead to easily interpretable guarantee in a variety of inference tasks with statistical constraints on accuracy and privacy. Although the underlying optimization problems are non-convex, we develop a technique to evaluate bottleneck problems in closed form by equivalently expressing them in terms of lower convex or upper concave envelope of certain functions. By applying this technique to binary case, we derive closed form expressions for several bottleneck problems
    • …
    corecore