Search CORE

38,500 research outputs found

Information Extraction Under Privacy Constraints

Author: Alajaji Fady
Asoodeh Shahab
Diaz Mario
Linder Tamás
Publication venue
Publication date: 17/01/2016
Field of study

A privacy-constrained information extraction problem is considered where for a pair of correlated discrete random variables

(X,Y)

governed by a given joint distribution, an agent observes

Y

and wants to convey to a potentially public user as much information about

Y

as possible without compromising the amount of information revealed about

X

. To this end, the so-called {\em rate-privacy function} is introduced to quantify the maximal amount of information (measured in terms of mutual information) that can be extracted from

Y

under a privacy constraint between

X

and the extracted information, where privacy is measured using either mutual information or maximal correlation. Properties of the rate-privacy function are analyzed and information-theoretic and estimation-theoretic interpretations of it are presented for both the mutual information and maximal correlation privacy measures. It is also shown that the rate-privacy function admits a closed-form expression for a large family of joint distributions of

(X,Y)

. Finally, the rate-privacy function under the mutual information privacy measure is considered for the case where

(X,Y)

has a joint probability density function by studying the problem where the extracted information is a uniform quantization of

Y

corrupted by additive Gaussian noise. The asymptotic behavior of the rate-privacy function is studied as the quantization resolution grows without bound and it is observed that not all of the properties of the rate-privacy function carry over from the discrete to the continuous case.Comment: 55 pages, 6 figures. Improved the organization and added detailed literature revie

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data

Author: Daub Carsten O
Kloska Sebastian
Selbig Joachim
Steuer Ralf
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. RESULTS: In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures. A C++ source code of our algorithm is available for non-commercial use from [email protected] upon request. CONCLUSION: The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MPG.PuRe

JIDT: An information-theoretic toolkit for studying the dynamics of complex systems

Author: Lizier Joseph T.
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

Complex systems are increasingly being viewed as distributed information processing systems, particularly in the domains of computational neuroscience, bioinformatics and Artificial Life. This trend has resulted in a strong uptake in the use of (Shannon) information-theoretic measures to analyse the dynamics of complex systems in these fields. We introduce the Java Information Dynamics Toolkit (JIDT): a Google code project which provides a standalone, (GNU GPL v3 licensed) open-source code implementation for empirical estimation of information-theoretic measures from time-series data. While the toolkit provides classic information-theoretic measures (e.g. entropy, mutual information, conditional mutual information), it ultimately focusses on implementing higher-level measures for information dynamics. That is, JIDT focusses on quantifying information storage, transfer and modification, and the dynamics of these operations in space and time. For this purpose, it includes implementations of the transfer entropy and active information storage, their multivariate extensions and local or pointwise variants. JIDT provides implementations for both discrete and continuous-valued data for each measure, including various types of estimator for continuous data (e.g. Gaussian, box-kernel and Kraskov-Stoegbauer-Grassberger) which can be swapped at run-time due to Java's object-oriented polymorphism. Furthermore, while written in Java, the toolkit can be used directly in MATLAB, GNU Octave, Python and other environments. We present the principles behind the code design, and provide several examples to guide users.Comment: 37 pages, 4 figure

arXiv.org e-Print Archive

Frontiers - Publisher Connector

Almost Perfect Privacy for Additive Gaussian Privacy Filters

Author: A Rényi
D Guo
D Guo
D Rebollo-Monedero
H Gebelein
H Yamamoto
L Sankar
O Sarmanov
S Goldwasser
T Berger
TM Cover
VV Prelov
Y Polyanskiy
Y Wu
Publication venue
Publication date: 13/08/2016
Field of study

We study the maximal mutual information about a random variable

Y

(representing non-private information) displayed through an additive Gaussian channel when guaranteeing that only

\epsilon

bits of information is leaked about a random variable

X

(representing private information) that is correlated with

Y

. Denoting this quantity by

g_\epsilon(X,Y)

, we show that for perfect privacy, i.e.,

\epsilon=0

, one has

g_0(X,Y)=0

for any pair of absolutely continuous random variables

(X,Y)

and then derive a second-order approximation for

g_\epsilon(X,Y)

for small

\epsilon

. This approximation is shown to be related to the strong data processing inequality for mutual information under suitable conditions on the joint distribution

P_{XY}

. Next, motivated by an operational interpretation of data privacy, we formulate the privacy-utility tradeoff in the same setup using estimation-theoretic quantities and obtain explicit bounds for this tradeoff when

\epsilon

is sufficiently small using the approximation formula derived for

g_\epsilon(X,Y)

.Comment: 20 pages. To appear in Springer-Verla

arXiv.org e-Print Archive

Crossref

Information-Theoretic Analysis of Serial Dependence and Cointegration

Author: Aparicio Felipe M.
Escribano Álvaro
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/1998
Field of study

This paper is devoted to presenting wider characterizations of memory and cointegration in time series, in terms of information-theoretic statistics such as the entropy and the mutual information between pairs of variables. We suggest a nonparametric and nonlinear methodology for data analysis and for testing the hypotheses of long memory and the existence of a cointegrating relationship in a nonlinear context. This new framework represents a natural extension of the linear-memory concepts based on correlations. Finally, we show that our testing devices seem promising for exploratory analysis with nonlinearly cointegrated time series.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo