38,500 research outputs found

    Information Extraction Under Privacy Constraints

    Full text link
    A privacy-constrained information extraction problem is considered where for a pair of correlated discrete random variables (X,Y)(X,Y) governed by a given joint distribution, an agent observes YY and wants to convey to a potentially public user as much information about YY as possible without compromising the amount of information revealed about XX. To this end, the so-called {\em rate-privacy function} is introduced to quantify the maximal amount of information (measured in terms of mutual information) that can be extracted from YY under a privacy constraint between XX and the extracted information, where privacy is measured using either mutual information or maximal correlation. Properties of the rate-privacy function are analyzed and information-theoretic and estimation-theoretic interpretations of it are presented for both the mutual information and maximal correlation privacy measures. It is also shown that the rate-privacy function admits a closed-form expression for a large family of joint distributions of (X,Y)(X,Y). Finally, the rate-privacy function under the mutual information privacy measure is considered for the case where (X,Y)(X,Y) has a joint probability density function by studying the problem where the extracted information is a uniform quantization of YY corrupted by additive Gaussian noise. The asymptotic behavior of the rate-privacy function is studied as the quantization resolution grows without bound and it is observed that not all of the properties of the rate-privacy function carry over from the discrete to the continuous case.Comment: 55 pages, 6 figures. Improved the organization and added detailed literature revie

    Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data

    Get PDF
    BACKGROUND: The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. RESULTS: In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures. A C++ source code of our algorithm is available for non-commercial use from [email protected] upon request. CONCLUSION: The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended

    JIDT: An information-theoretic toolkit for studying the dynamics of complex systems

    Get PDF
    Complex systems are increasingly being viewed as distributed information processing systems, particularly in the domains of computational neuroscience, bioinformatics and Artificial Life. This trend has resulted in a strong uptake in the use of (Shannon) information-theoretic measures to analyse the dynamics of complex systems in these fields. We introduce the Java Information Dynamics Toolkit (JIDT): a Google code project which provides a standalone, (GNU GPL v3 licensed) open-source code implementation for empirical estimation of information-theoretic measures from time-series data. While the toolkit provides classic information-theoretic measures (e.g. entropy, mutual information, conditional mutual information), it ultimately focusses on implementing higher-level measures for information dynamics. That is, JIDT focusses on quantifying information storage, transfer and modification, and the dynamics of these operations in space and time. For this purpose, it includes implementations of the transfer entropy and active information storage, their multivariate extensions and local or pointwise variants. JIDT provides implementations for both discrete and continuous-valued data for each measure, including various types of estimator for continuous data (e.g. Gaussian, box-kernel and Kraskov-Stoegbauer-Grassberger) which can be swapped at run-time due to Java's object-oriented polymorphism. Furthermore, while written in Java, the toolkit can be used directly in MATLAB, GNU Octave, Python and other environments. We present the principles behind the code design, and provide several examples to guide users.Comment: 37 pages, 4 figure

    Almost Perfect Privacy for Additive Gaussian Privacy Filters

    Full text link
    We study the maximal mutual information about a random variable YY (representing non-private information) displayed through an additive Gaussian channel when guaranteeing that only ϵ\epsilon bits of information is leaked about a random variable XX (representing private information) that is correlated with YY. Denoting this quantity by gϵ(X,Y)g_\epsilon(X,Y), we show that for perfect privacy, i.e., ϵ=0\epsilon=0, one has g0(X,Y)=0g_0(X,Y)=0 for any pair of absolutely continuous random variables (X,Y)(X,Y) and then derive a second-order approximation for gϵ(X,Y)g_\epsilon(X,Y) for small ϵ\epsilon. This approximation is shown to be related to the strong data processing inequality for mutual information under suitable conditions on the joint distribution PXYP_{XY}. Next, motivated by an operational interpretation of data privacy, we formulate the privacy-utility tradeoff in the same setup using estimation-theoretic quantities and obtain explicit bounds for this tradeoff when ϵ\epsilon is sufficiently small using the approximation formula derived for gϵ(X,Y)g_\epsilon(X,Y).Comment: 20 pages. To appear in Springer-Verla

    Information-Theoretic Analysis of Serial Dependence and Cointegration

    Get PDF
    This paper is devoted to presenting wider characterizations of memory and cointegration in time series, in terms of information-theoretic statistics such as the entropy and the mutual information between pairs of variables. We suggest a nonparametric and nonlinear methodology for data analysis and for testing the hypotheses of long memory and the existence of a cointegrating relationship in a nonlinear context. This new framework represents a natural extension of the linear-memory concepts based on correlations. Finally, we show that our testing devices seem promising for exploratory analysis with nonlinearly cointegrated time series.Publicad
    corecore