16 research outputs found
Context-Aware Generative Adversarial Privacy
Preserving the utility of published datasets while simultaneously providing
provable privacy guarantees is a well-known challenge. On the one hand,
context-free privacy solutions, such as differential privacy, provide strong
privacy guarantees, but often lead to a significant reduction in utility. On
the other hand, context-aware privacy solutions, such as information theoretic
privacy, achieve an improved privacy-utility tradeoff, but assume that the data
holder has access to dataset statistics. We circumvent these limitations by
introducing a novel context-aware privacy framework called generative
adversarial privacy (GAP). GAP leverages recent advancements in generative
adversarial networks (GANs) to allow the data holder to learn privatization
schemes from the dataset itself. Under GAP, learning the privacy mechanism is
formulated as a constrained minimax game between two players: a privatizer that
sanitizes the dataset in a way that limits the risk of inference attacks on the
individuals' private variables, and an adversary that tries to infer the
private variables from the sanitized dataset. To evaluate GAP's performance, we
investigate two simple (yet canonical) statistical dataset models: (a) the
binary data model, and (b) the binary Gaussian mixture model. For both models,
we derive game-theoretically optimal minimax privacy mechanisms, and show that
the privacy mechanisms learned from data (in a generative adversarial fashion)
match the theoretically optimal ones. This demonstrates that our framework can
be easily applied in practice, even in the absence of dataset statistics.Comment: Improved version of a paper accepted by Entropy Journal, Special
Issue on Information Theory in Machine Learning and Data Scienc
Bounds on mutual information of mixture data for classification tasks
The data for many classification problems, such as pattern and speech
recognition, follow mixture distributions. To quantify the optimum performance
for classification tasks, the Shannon mutual information is a natural
information-theoretic metric, as it is directly related to the probability of
error. The mutual information between mixture data and the class label does not
have an analytical expression, nor any efficient computational algorithms. We
introduce a variational upper bound, a lower bound, and three estimators, all
employing pair-wise divergences between mixture components. We compare the new
bounds and estimators with Monte Carlo stochastic sampling and bounds derived
from entropy bounds. To conclude, we evaluate the performance of the bounds and
estimators through numerical simulations