764 research outputs found
From the Information Bottleneck to the Privacy Funnel
We focus on the privacy-utility trade-off encountered by users who wish to
disclose some information to an analyst, that is correlated with their private
data, in the hope of receiving some utility. We rely on a general privacy
statistical inference framework, under which data is transformed before it is
disclosed, according to a probabilistic privacy mapping. We show that when the
log-loss is introduced in this framework in both the privacy metric and the
distortion metric, the privacy leakage and the utility constraint can be
reduced to the mutual information between private data and disclosed data, and
between non-private data and disclosed data respectively. We justify the
relevance and generality of the privacy metric under the log-loss by proving
that the inference threat under any bounded cost function can be upper-bounded
by an explicit function of the mutual information between private data and
disclosed data. We then show that the privacy-utility tradeoff under the
log-loss can be cast as the non-convex Privacy Funnel optimization, and we
leverage its connection to the Information Bottleneck, to provide a greedy
algorithm that is locally optimal. We evaluate its performance on the US census
dataset
Convexity and Operational Interpretation of the Quantum Information Bottleneck Function
In classical information theory, the information bottleneck method (IBM) can
be regarded as a method of lossy data compression which focusses on preserving
meaningful (or relevant) information. As such it has recently gained a lot of
attention, primarily for its applications in machine learning and neural
networks. A quantum analogue of the IBM has recently been defined, and an
attempt at providing an operational interpretation of the so-called quantum IB
function as an optimal rate of an information-theoretic task, has recently been
made by Salek et al. However, the interpretation given in that paper has a
couple of drawbacks; firstly its proof is based on a conjecture that the
quantum IB function is convex, and secondly, the expression for the rate
function involves certain entropic quantities which occur explicitly in the
very definition of the underlying information-theoretic task, thus making the
latter somewhat contrived. We overcome both of these drawbacks by first proving
the convexity of the quantum IB function, and then giving an alternative
operational interpretation of it as the optimal rate of a bona fide
information-theoretic task, namely that of quantum source coding with quantum
side information at the decoder, and relate the quantum IB function to the rate
region of this task. We similarly show that the related privacy funnel function
is convex (both in the classical and quantum case). However, we comment that it
is unlikely that the quantum privacy funnel function can characterize the
optimal asymptotic rate of an information theoretic task, since even its
classical version lacks a certain additivity property which turns out to be
essential.Comment: 17 pages, 7 figures; v2: improved presentation and explanations, one
new figure; v3: Restructured manuscript. Theorem 2 has been found previously
in work by Hsieh and Watanabe; it is now correctly attribute
FUNCK: Information Funnels and Bottlenecks for Invariant Representation Learning
Learning invariant representations that remain useful for a downstream task
is still a key challenge in machine learning. We investigate a set of related
information funnels and bottleneck problems that claim to learn invariant
representations from the data. We also propose a new element to this family of
information-theoretic objectives: The Conditional Privacy Funnel with Side
Information, which we investigate in fully and semi-supervised settings. Given
the generally intractable objectives, we derive tractable approximations using
amortized variational inference parameterized by neural networks and study the
intrinsic trade-offs of these objectives. We describe empirically the proposed
approach and show that with a few labels it is possible to learn fair
classifiers and generate useful representations approximately invariant to
unwanted sources of variation. Furthermore, we provide insights about the
applicability of these methods in real-world scenarios with ordinary tabular
datasets when the data is scarce.Comment: 28 page
Privacy-Constrained Remote Source Coding
We consider the problem of revealing/sharing data in an efficient and secure
way via a compact representation. The representation should ensure reliable
reconstruction of the desired features/attributes while still preserve privacy
of the secret parts of the data. The problem is formulated as a remote lossy
source coding with a privacy constraint where the remote source consists of
public and secret parts. Inner and outer bounds for the optimal tradeoff region
of compression rate, distortion, and privacy leakage rate are given and shown
to coincide for some special cases. When specializing the distortion measure to
a logarithmic loss function, the resulting rate-distortion-leakage tradeoff for
the case of identical side information forms an optimization problem which
corresponds to the "secure" version of the so-called information bottleneck.Comment: 10 pages, 1 figure, to be presented at ISIT 201
Compressive Privacy for a Linear Dynamical System
We consider a linear dynamical system in which the state vector consists of
both public and private states. One or more sensors make measurements of the
state vector and sends information to a fusion center, which performs the final
state estimation. To achieve an optimal tradeoff between the utility of
estimating the public states and protection of the private states, the
measurements at each time step are linearly compressed into a lower dimensional
space. Under the centralized setting where all measurements are collected by a
single sensor, we propose an optimization problem and an algorithm to find the
best compression matrix. Under the decentralized setting where measurements are
made separately at multiple sensors, each sensor optimizes its own local
compression matrix. We propose methods to separate the overall optimization
problem into multiple sub-problems that can be solved locally at each sensor.
We consider the cases where there is no message exchange between the sensors;
and where each sensor takes turns to transmit messages to the other sensors.
Simulations and empirical experiments demonstrate the efficiency of our
proposed approach in allowing the fusion center to estimate the public states
with good accuracy while preventing it from estimating the private states
accurately
Bottleneck Problems: Information and Estimation-Theoretic View
Information bottleneck (IB) and privacy funnel (PF) are two closely related
optimization problems which have found applications in machine learning, design
of privacy algorithms, capacity problems (e.g., Mrs. Gerber's Lemma), strong
data processing inequalities, among others. In this work, we first investigate
the functional properties of IB and PF through a unified theoretical framework.
We then connect them to three information-theoretic coding problems, namely
hypothesis testing against independence, noisy source coding and dependence
dilution. Leveraging these connections, we prove a new cardinality bound for
the auxiliary variable in IB, making its computation more tractable for
discrete random variables.
In the second part, we introduce a general family of optimization problems,
termed as \textit{bottleneck problems}, by replacing mutual information in IB
and PF with other notions of mutual information, namely -information and
Arimoto's mutual information. We then argue that, unlike IB and PF, these
problems lead to easily interpretable guarantee in a variety of inference tasks
with statistical constraints on accuracy and privacy. Although the underlying
optimization problems are non-convex, we develop a technique to evaluate
bottleneck problems in closed form by equivalently expressing them in terms of
lower convex or upper concave envelope of certain functions. By applying this
technique to binary case, we derive closed form expressions for several
bottleneck problems
- …