14 research outputs found

    Discovering Potential Correlations via Hypercontractivity

    Full text link
    Discovering a correlation from one variable to another variable is of fundamental scientific and practical interest. While existing correlation measures are suitable for discovering average correlation, they fail to discover hidden or potential correlations. To bridge this gap, (i) we postulate a set of natural axioms that we expect a measure of potential correlation to satisfy; (ii) we show that the rate of information bottleneck, i.e., the hypercontractivity coefficient, satisfies all the proposed axioms; (iii) we provide a novel estimator to estimate the hypercontractivity coefficient from samples; and (iv) we provide numerical experiments demonstrating that this proposed estimator discovers potential correlations among various indicators of WHO datasets, is robust in discovering gene interactions from gene expression time series data, and is statistically more powerful than the estimators for other correlation measures in binary hypothesis testing of canonical examples of potential correlations.Comment: 30 pages, 19 figures, accepted for publication in the 31st Conference on Neural Information Processing Systems (NIPS 2017

    Bottleneck Problems: Information and Estimation-Theoretic View

    Full text link
    Information bottleneck (IB) and privacy funnel (PF) are two closely related optimization problems which have found applications in machine learning, design of privacy algorithms, capacity problems (e.g., Mrs. Gerber's Lemma), strong data processing inequalities, among others. In this work, we first investigate the functional properties of IB and PF through a unified theoretical framework. We then connect them to three information-theoretic coding problems, namely hypothesis testing against independence, noisy source coding and dependence dilution. Leveraging these connections, we prove a new cardinality bound for the auxiliary variable in IB, making its computation more tractable for discrete random variables. In the second part, we introduce a general family of optimization problems, termed as \textit{bottleneck problems}, by replacing mutual information in IB and PF with other notions of mutual information, namely ff-information and Arimoto's mutual information. We then argue that, unlike IB and PF, these problems lead to easily interpretable guarantee in a variety of inference tasks with statistical constraints on accuracy and privacy. Although the underlying optimization problems are non-convex, we develop a technique to evaluate bottleneck problems in closed form by equivalently expressing them in terms of lower convex or upper concave envelope of certain functions. By applying this technique to binary case, we derive closed form expressions for several bottleneck problems

    Information Bottleneck

    Get PDF
    The celebrated information bottleneck (IB) principle of Tishby et al. has recently enjoyed renewed attention due to its application in the area of deep learning. This collection investigates the IB principle in this new context. The individual chapters in this collection: • provide novel insights into the functional properties of the IB; • discuss the IB principle (and its derivates) as an objective for training multi-layer machine learning structures such as neural networks and decision trees; and • offer a new perspective on neural network learning via the lens of the IB framework. Our collection thus contributes to a better understanding of the IB principle specifically for deep learning and, more generally, of information–theoretic cost functions in machine learning. This paves the way toward explainable artificial intelligence

    Network Maximal Correlation

    Get PDF
    Identifying nonlinear relationships in large datasets is a daunting task particularly when the form of the nonlinearity is unknown. Here, we introduce Network Maximal Correlation (NMC) as a fundamental measure to capture nonlinear associations in networks without the knowledge of underlying nonlinearity shapes. NMC infers, possibly nonlinear, transformations of variables with zero means and unit variances by maximizing total nonlinear correlation over the underlying network. For the case of having two variables, NMC is equivalent to the standard Maximal Correlation. We characterize a solution of the NMC optimization using geometric properties of Hilbert spaces for both discrete and jointly Gaussian variables. For discrete random variables, we show that the NMC optimization is an instance of the Maximum Correlation Problem and provide necessary conditions for its global optimal solution. Moreover, we propose an efficient algorithm based on Alternating Conditional Expectation (ACE) which converges to a local NMC optimum. For this algorithm, we provide guidelines for choosing appropriate starting points to jump out of local maximizers. We also propose a distributed algorithm to compute a 1-ϵ\epsilon approximation of the NMC value for large and dense graphs using graph partitioning. For jointly Gaussian variables, under some conditions, we show that the NMC optimization can be simplified to a Max-Cut problem, where we provide conditions under which an NMC solution can be computed exactly. Under some general conditions, we show that NMC can infer the underlying graphical model for functions of latent jointly Gaussian variables. These functions are unknown, bijective, and can be nonlinear. This result broadens the family of continuous distributions whose graphical models can be characterized efficiently. We illustrate the robustness of NMC in real world applications by showing its continuity with respect to small perturbations of joint distributions. We also show that sample NMC (NMC computed using empirical distributions) converges exponentially fast to the true NMC value. Finally, we apply NMC to different cancer datasets including breast, kidney and liver cancers, and show that NMC infers gene modules that are significantly associated with survival times of individuals while they are not detected using linear association measures

    The (q,t)(q,t)-Gaussian Process

    Get PDF
    We introduce a two-parameter deformation of the classical Bosonic, Fermionic, and Boltzmann Fock spaces that is a refinement of the qq-Fock space of [BS91]. Starting with a real, separable Hilbert space HH, we construct the (q,t)(q,t)-Fock space and the corresponding creation and annihilation operators, {aq,t(h)∗}h∈H\{a_{q,t}(h)^\ast\}_{h\in H} and {aq,t(h)}h∈H\{a_{q,t}(h)\}_{h\in H}, satifying the (q,t)(q,t)-commutation relation aq,t(f)aq,t(g)∗−q aq,t(g)∗aq,t(f)=H tN,a_{q,t}(f)a_{q,t}(g)^\ast-q \,a_{q,t}(g)^\ast a_{q,t}(f)= _{_H}\, t^{N}, for h,g∈Hh,g\in H, with NN denoting the number operator. Interpreting the bounded linear operators on the (q,t)(q,t)-Fock space as non-commutative random variables, the analogue of the Gaussian random variable is given by the deformed field operator sq,t(h):=aq,t(h)+aq,t(h)∗s_{q,t}(h):=a_{q,t}(h)+a_{q,t}(h)^\ast, for h∈Hh\in H. The resulting refinement is particularly natural, as the moments of sq,t(h)s_{q,t}(h) are encoded by the joint statistics of crossings \emph{and nestings} in pair partitions. Furthermore, the orthogonal polynomial sequence associated with the normalized (q,t)(q,t)-Gaussian sq,ts_{q,t} is that of the (q,t)(q,t)-Hermite orthogonal polynomials, a deformation of the qq-Hermite sequence that is given by the recurrence zHn(z;q,t)=Hn+1(z;q,t)+[n]q,tHn−1(z;q,t),zH_n(z;q,t)=H_{n+1}(z;q,t)+[n]_{q,t}H_{n-1}(z;q,t), with H0(z;q,t)=1H_0(z;q,t)=1, H1(z;q,t)=zH_1(z;q,t)=z, and [n]q,t=∑i=1nqi−1tn−i[n]_{q,t}=\sum_{i=1}^n q^{i-1}t^{n-i}. The q=0<tq=0<t specialization yields a new single-parameter deformation of the full Boltzmann Fock space of free probability. The probability measure associated with the corresponding deformed semicircular operator turns out to be encoded, in various forms, via the Rogers-Ramanujan continued fraction, the Rogers-Ramanujan identities, the tt-Airy function, the tt-Catalan numbers of Carlitz-Riordan, and the first-order statistics of the reduced Wigner process.Comment: The present version reverts to v2, by removing former Lemma 13 that contained an erro
    corecore