38,500 research outputs found
Information Extraction Under Privacy Constraints
A privacy-constrained information extraction problem is considered where for
a pair of correlated discrete random variables governed by a given
joint distribution, an agent observes and wants to convey to a potentially
public user as much information about as possible without compromising the
amount of information revealed about . To this end, the so-called {\em
rate-privacy function} is introduced to quantify the maximal amount of
information (measured in terms of mutual information) that can be extracted
from under a privacy constraint between and the extracted information,
where privacy is measured using either mutual information or maximal
correlation. Properties of the rate-privacy function are analyzed and
information-theoretic and estimation-theoretic interpretations of it are
presented for both the mutual information and maximal correlation privacy
measures. It is also shown that the rate-privacy function admits a closed-form
expression for a large family of joint distributions of . Finally, the
rate-privacy function under the mutual information privacy measure is
considered for the case where has a joint probability density function
by studying the problem where the extracted information is a uniform
quantization of corrupted by additive Gaussian noise. The asymptotic
behavior of the rate-privacy function is studied as the quantization resolution
grows without bound and it is observed that not all of the properties of the
rate-privacy function carry over from the discrete to the continuous case.Comment: 55 pages, 6 figures. Improved the organization and added detailed
literature revie
Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data
BACKGROUND: The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. RESULTS: In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures. A C++ source code of our algorithm is available for non-commercial use from [email protected] upon request. CONCLUSION: The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended
JIDT: An information-theoretic toolkit for studying the dynamics of complex systems
Complex systems are increasingly being viewed as distributed information
processing systems, particularly in the domains of computational neuroscience,
bioinformatics and Artificial Life. This trend has resulted in a strong uptake
in the use of (Shannon) information-theoretic measures to analyse the dynamics
of complex systems in these fields. We introduce the Java Information Dynamics
Toolkit (JIDT): a Google code project which provides a standalone, (GNU GPL v3
licensed) open-source code implementation for empirical estimation of
information-theoretic measures from time-series data. While the toolkit
provides classic information-theoretic measures (e.g. entropy, mutual
information, conditional mutual information), it ultimately focusses on
implementing higher-level measures for information dynamics. That is, JIDT
focusses on quantifying information storage, transfer and modification, and the
dynamics of these operations in space and time. For this purpose, it includes
implementations of the transfer entropy and active information storage, their
multivariate extensions and local or pointwise variants. JIDT provides
implementations for both discrete and continuous-valued data for each measure,
including various types of estimator for continuous data (e.g. Gaussian,
box-kernel and Kraskov-Stoegbauer-Grassberger) which can be swapped at run-time
due to Java's object-oriented polymorphism. Furthermore, while written in Java,
the toolkit can be used directly in MATLAB, GNU Octave, Python and other
environments. We present the principles behind the code design, and provide
several examples to guide users.Comment: 37 pages, 4 figure
Almost Perfect Privacy for Additive Gaussian Privacy Filters
We study the maximal mutual information about a random variable
(representing non-private information) displayed through an additive Gaussian
channel when guaranteeing that only bits of information is leaked
about a random variable (representing private information) that is
correlated with . Denoting this quantity by , we show that
for perfect privacy, i.e., , one has for any pair of
absolutely continuous random variables and then derive a second-order
approximation for for small . This approximation is
shown to be related to the strong data processing inequality for mutual
information under suitable conditions on the joint distribution . Next,
motivated by an operational interpretation of data privacy, we formulate the
privacy-utility tradeoff in the same setup using estimation-theoretic
quantities and obtain explicit bounds for this tradeoff when is
sufficiently small using the approximation formula derived for
.Comment: 20 pages. To appear in Springer-Verla
Information-Theoretic Analysis of Serial Dependence and Cointegration
This paper is devoted to presenting wider characterizations of memory and cointegration in time series, in terms of information-theoretic statistics such as the entropy and the mutual information between pairs of variables. We suggest a nonparametric and nonlinear methodology for data analysis and for testing the hypotheses of long memory and the existence of a cointegrating relationship in a nonlinear context. This new framework represents a natural extension of the linear-memory concepts based on correlations. Finally, we show that our testing devices seem promising for exploratory analysis with nonlinearly cointegrated time series.Publicad
- …