41 research outputs found
Strong Data Processing Inequalities for Input Constrained Additive Noise Channels
This paper quantifies the intuitive observation that adding noise reduces
available information by means of non-linear strong data processing
inequalities. Consider the random variables forming a Markov
chain, where with and real-valued, independent and bounded
in -norm. It is shown that with
whenever , if and only if has a density whose support is not disjoint
from any translate of itself. A related question is to characterize for what
couplings the mutual information is close to maximum possible.
To that end we show that in order to saturate the channel, i.e. for to
approach capacity, it is mandatory that (under suitable
conditions on the channel). A key ingredient for this result is a deconvolution
lemma which shows that post-convolution total variation distance bounds the
pre-convolution Kolmogorov-Smirnov distance. Explicit bounds are provided for
the special case of the additive Gaussian noise channel with quadratic cost
constraint. These bounds are shown to be order-optimal. For this case
simplified proofs are provided leveraging Gaussian-specific tools such as the
connection between information and estimation (I-MMSE) and Talagrand's
information-transportation inequality
Correspondence Analysis Using Neural Networks
Correspondence analysis (CA) is a multivariate statistical tool used to
visualize and interpret data dependencies. CA has found applications in fields
ranging from epidemiology to social sciences. However, current methods used to
perform CA do not scale to large, high-dimensional datasets. By re-interpreting
the objective in CA using an information-theoretic tool called the principal
inertia components, we demonstrate that performing CA is equivalent to solving
a functional optimization problem over the space of finite variance functions
of two random variable. We show that this optimization problem, in turn, can be
efficiently approximated by neural networks. The resulting formulation, called
the correspondence analysis neural network (CA-NN), enables CA to be performed
at an unprecedented scale. We validate the CA-NN on synthetic data, and
demonstrate how it can be used to perform CA on a variety of datasets,
including food recipes, wine compositions, and images. Our results outperform
traditional methods used in CA, indicating that CA-NN can serve as a new,
scalable tool for interpretability and visualization of complex dependencies
between random variables.Comment: Accepted to AISTATS 2019. Overlaps with arXiv:1806.0844
On the Direction of Discrimination: An Information-Theoretic Analysis of Disparate Impact in Machine Learning
In the context of machine learning, disparate impact refers to a form of
systematic discrimination whereby the output distribution of a model depends on
the value of a sensitive attribute (e.g., race or gender). In this paper, we
propose an information-theoretic framework to analyze the disparate impact of a
binary classification model. We view the model as a fixed channel, and quantify
disparate impact as the divergence in output distributions over two groups. Our
aim is to find a correction function that can perturb the input distributions
of each group to align their output distributions. We present an optimization
problem that can be solved to obtain a correction function that will make the
output distributions statistically indistinguishable. We derive closed-form
expressions to efficiently compute the correction function, and demonstrate the
benefits of our framework on a recidivism prediction problem based on the
ProPublica COMPAS dataset
A Tunable Measure for Information Leakage
A tunable measure for information leakage called \textit{maximal
-leakage} is introduced. This measure quantifies the maximal gain of an
adversary in refining a tilted version of its prior belief of any (potentially
random) function of a dataset conditioning on a disclosed dataset. The choice
of determines the specific adversarial action ranging from refining a
belief for to guessing the best posterior for ,
and for these extremal values this measure simplifies to mutual information
(MI) and maximal leakage (MaxL), respectively. For all other this
measure is shown to be the Arimoto channel capacity. Several properties of this
measure are proven including: (i) quasi-convexity in the mapping between the
original and disclosed datasets; (ii) data processing inequalities; and (iii) a
composition property.Comment: 7 pages. This paper is the extended version of the conference paper
"A Tunable Measure for Information Leakage" accepted by ISIT 201
On the Robustness of Information-Theoretic Privacy Measures and Mechanisms
Consider a data publishing setting for a dataset composed by both private and
non-private features. The publisher uses an empirical distribution, estimated
from i.i.d. samples, to design a privacy mechanism which is applied to new
fresh samples afterward. In this paper, we study the discrepancy between the
privacy-utility guarantees for the empirical distribution, used to design the
privacy mechanism, and those for the true distribution, experienced by the
privacy mechanism in practice. We first show that, for any privacy mechanism,
these discrepancies vanish at speed with high probability.
These bounds follow from our main technical results regarding the Lipschitz
continuity of the considered information leakage measures. Then we prove that
the optimal privacy mechanisms for the empirical distribution approach the
corresponding mechanisms for the true distribution as the sample size
increases, thereby establishing the statistical consistency of the optimal
privacy mechanisms. Finally, we introduce and study uniform privacy mechanisms
which, by construction, provide privacy to all the distributions within a
neighborhood of the estimated distribution and, thereby, guarantee privacy for
the true distribution with high probability
Privacy Under Hard Distortion Constraints
We study the problem of data disclosure with privacy guarantees, wherein the
utility of the disclosed data is ensured via a \emph{hard distortion}
constraint. Unlike average distortion, hard distortion provides a deterministic
guarantee of fidelity. For the privacy measure, we use a tunable information
leakage measure, namely \textit{maximal -leakage}
(), and formulate the privacy-utility tradeoff problem.
The resulting solution highlights that under a hard distortion constraint, the
nature of the solution remains unchanged for both local and non-local privacy
requirements. More precisely, we show that both the optimal mechanism and the
optimal tradeoff are invariant for any ; i.e., the tunable leakage
measure only behaves as either of the two extrema, i.e., mutual information for
and maximal leakage for .Comment: 5 pages, 1 figur
Robustness of Maximal -Leakage to Side Information
Maximal -leakage is a tunable measure of information leakage based on
the accuracy of guessing an arbitrary function of private data based on public
data. The parameter determines the loss function used to measure the
accuracy of a belief, ranging from log-loss at to the probability of
error at . To study the effect of side information on this
measure, we introduce and define conditional maximal -leakage. We show
that, for a chosen mapping (channel) from the actual (viewed as private) data
to the released (public) data and some side information, the conditional
maximal -leakage is the supremum (over all side information) of the
conditional Arimoto channel capacity where the conditioning is on the side
information. We prove that if the side information is conditionally independent
of the public data given the private data, the side information cannot increase
the information leakage.Comment: This paper has been accepted by ISIT 201
Hypothesis Testing under Mutual Information Privacy Constraints in the High Privacy Regime
Hypothesis testing is a statistical inference framework for determining the
true distribution among a set of possible distributions for a given dataset.
Privacy restrictions may require the curator of the data or the respondents
themselves to share data with the test only after applying a randomizing
privacy mechanism. This work considers mutual information (MI) as the privacy
metric for measuring leakage. In addition, motivated by the Chernoff-Stein
lemma, the relative entropy between pairs of distributions of the output
(generated by the privacy mechanism) is chosen as the utility metric. For these
metrics, the goal is to find the optimal privacy-utility trade-off (PUT) and
the corresponding optimal privacy mechanism for both binary and m-ary
hypothesis testing. Focusing on the high privacy regime, Euclidean
information-theoretic approximations of the binary and m-ary PUT problems are
developed. The solutions for the approximation problems clarify that an
MI-based privacy metric preserves the privacy of the source symbols in inverse
proportion to their likelihoods.Comment: 13 pages, 7 figures. The paper is submitted to "Transactions on
Information Forensics & Security". Comparing to the paper arXiv:1607.00533
"Hypothesis Testing in the High Privacy Limit", the overlapping content is
results for binary hypothesis testing with a zero error exponent, and the
extended contents are the results for both m-ary hypothesis testing and
binary hypothesis testing with nonzero error exponent
Optimized Data Pre-Processing for Discrimination Prevention
Non-discrimination is a recognized objective in algorithmic decision making.
In this paper, we introduce a novel probabilistic formulation of data
pre-processing for reducing discrimination. We propose a convex optimization
for learning a data transformation with three goals: controlling
discrimination, limiting distortion in individual data samples, and preserving
utility. We characterize the impact of limited sample size in accomplishing
this objective, and apply two instances of the proposed optimization to
datasets, including one on real-world criminal recidivism. The results
demonstrate that all three criteria can be simultaneously achieved and also
reveal interesting patterns of bias in American society
Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions
When the performance of a machine learning model varies over groups defined
by sensitive attributes (e.g., gender or ethnicity), the performance disparity
can be expressed in terms of the probability distributions of the input and
output variables over each group. In this paper, we exploit this fact to reduce
the disparate impact of a fixed classification model over a population of
interest. Given a black-box classifier, we aim to eliminate the performance gap
by perturbing the distribution of input variables for the disadvantaged group.
We refer to the perturbed distribution as a counterfactual distribution, and
characterize its properties for common fairness criteria. We introduce a
descent algorithm to learn a counterfactual distribution from data. We then
discuss how the estimated distribution can be used to build a data preprocessor
that can reduce disparate impact without training a new model. We validate our
approach through experiments on real-world datasets, showing that it can repair
different forms of disparity without a significant drop in accuracy