14 research outputs found
Discovering Potential Correlations via Hypercontractivity
Discovering a correlation from one variable to another variable is of
fundamental scientific and practical interest. While existing correlation
measures are suitable for discovering average correlation, they fail to
discover hidden or potential correlations. To bridge this gap, (i) we postulate
a set of natural axioms that we expect a measure of potential correlation to
satisfy; (ii) we show that the rate of information bottleneck, i.e., the
hypercontractivity coefficient, satisfies all the proposed axioms; (iii) we
provide a novel estimator to estimate the hypercontractivity coefficient from
samples; and (iv) we provide numerical experiments demonstrating that this
proposed estimator discovers potential correlations among various indicators of
WHO datasets, is robust in discovering gene interactions from gene expression
time series data, and is statistically more powerful than the estimators for
other correlation measures in binary hypothesis testing of canonical examples
of potential correlations.Comment: 30 pages, 19 figures, accepted for publication in the 31st Conference
on Neural Information Processing Systems (NIPS 2017
Bottleneck Problems: Information and Estimation-Theoretic View
Information bottleneck (IB) and privacy funnel (PF) are two closely related
optimization problems which have found applications in machine learning, design
of privacy algorithms, capacity problems (e.g., Mrs. Gerber's Lemma), strong
data processing inequalities, among others. In this work, we first investigate
the functional properties of IB and PF through a unified theoretical framework.
We then connect them to three information-theoretic coding problems, namely
hypothesis testing against independence, noisy source coding and dependence
dilution. Leveraging these connections, we prove a new cardinality bound for
the auxiliary variable in IB, making its computation more tractable for
discrete random variables.
In the second part, we introduce a general family of optimization problems,
termed as \textit{bottleneck problems}, by replacing mutual information in IB
and PF with other notions of mutual information, namely -information and
Arimoto's mutual information. We then argue that, unlike IB and PF, these
problems lead to easily interpretable guarantee in a variety of inference tasks
with statistical constraints on accuracy and privacy. Although the underlying
optimization problems are non-convex, we develop a technique to evaluate
bottleneck problems in closed form by equivalently expressing them in terms of
lower convex or upper concave envelope of certain functions. By applying this
technique to binary case, we derive closed form expressions for several
bottleneck problems
Information Bottleneck
The celebrated information bottleneck (IB) principle of Tishby et al. has recently enjoyed renewed attention due to its application in the area of deep learning. This collection investigates the IB principle in this new context. The individual chapters in this collection: • provide novel insights into the functional properties of the IB; • discuss the IB principle (and its derivates) as an objective for training multi-layer machine learning structures such as neural networks and decision trees; and • offer a new perspective on neural network learning via the lens of the IB framework. Our collection thus contributes to a better understanding of the IB principle specifically for deep learning and, more generally, of information–theoretic cost functions in machine learning. This paves the way toward explainable artificial intelligence
Network Maximal Correlation
Identifying nonlinear relationships in large datasets is a daunting task particularly when the form of the nonlinearity is unknown. Here, we introduce Network Maximal Correlation (NMC) as a fundamental measure to capture nonlinear associations in networks without the knowledge of underlying nonlinearity shapes. NMC infers, possibly nonlinear, transformations of variables with zero means and unit variances by maximizing total nonlinear correlation over the underlying network. For the case of having two variables, NMC is equivalent to the standard Maximal Correlation. We characterize a solution of the NMC optimization using geometric properties of Hilbert spaces for both discrete and jointly Gaussian variables. For discrete random variables, we show that the NMC optimization is an instance of the Maximum Correlation Problem and provide necessary conditions for its global optimal solution. Moreover, we propose an efficient algorithm based on Alternating Conditional Expectation (ACE) which converges to a local NMC optimum. For this algorithm, we provide guidelines for choosing appropriate starting points to jump out of local maximizers. We also propose a distributed algorithm to compute a 1- approximation of the NMC value for large and dense graphs using graph partitioning. For jointly Gaussian variables, under some conditions, we show that the NMC optimization can be simplified to a Max-Cut problem, where we provide conditions under which an NMC solution can be computed exactly. Under some general conditions, we show that NMC can infer the underlying graphical model for functions of latent jointly Gaussian variables. These functions are unknown, bijective, and can be nonlinear. This result broadens the family of continuous distributions whose graphical models can be characterized efficiently. We illustrate the robustness of NMC in real world applications by showing its continuity with respect to small perturbations of joint distributions. We also show that sample NMC (NMC computed using empirical distributions) converges exponentially fast to the true NMC value. Finally, we apply NMC to different cancer datasets including breast, kidney and liver cancers, and show that NMC infers gene modules that are significantly associated with survival times of individuals while they are not detected using linear association measures
The -Gaussian Process
We introduce a two-parameter deformation of the classical Bosonic, Fermionic,
and Boltzmann Fock spaces that is a refinement of the -Fock space of [BS91].
Starting with a real, separable Hilbert space , we construct the
-Fock space and the corresponding creation and annihilation operators,
and , satifying the
-commutation relation for , with denoting the number
operator. Interpreting the bounded linear operators on the -Fock space
as non-commutative random variables, the analogue of the Gaussian random
variable is given by the deformed field operator
, for . The resulting
refinement is particularly natural, as the moments of are encoded
by the joint statistics of crossings \emph{and nestings} in pair partitions.
Furthermore, the orthogonal polynomial sequence associated with the normalized
-Gaussian is that of the -Hermite orthogonal
polynomials, a deformation of the -Hermite sequence that is given by the
recurrence with
, , and .
The specialization yields a new single-parameter deformation of the
full Boltzmann Fock space of free probability. The probability measure
associated with the corresponding deformed semicircular operator turns out to
be encoded, in various forms, via the Rogers-Ramanujan continued fraction, the
Rogers-Ramanujan identities, the -Airy function, the -Catalan numbers of
Carlitz-Riordan, and the first-order statistics of the reduced Wigner process.Comment: The present version reverts to v2, by removing former Lemma 13 that
contained an erro