Search CORE

37,940 research outputs found

Caveats for information bottleneck in deterministic scenarios

Author: Kolchinsky Artemy
Tracey Brendan D.
Van Kuyk Steven
Publication venue
Publication date: 08/02/2019
Field of study

Information bottleneck (IB) is a method for extracting information from one random variable

X

that is relevant for predicting another random variable

Y

. To do so, IB identifies an intermediate "bottleneck" variable

T

that has low mutual information

I(X;T)

and high mutual information

I(Y;T)

. The "IB curve" characterizes the set of bottleneck variables that achieve maximal

I(Y;T)

for a given

I(X;T)

, and is typically explored by maximizing the "IB Lagrangian",

I(Y;T) - \beta I(X;T)

. In some cases,

Y

is a deterministic function of

X

, including many classification problems in supervised learning where the output class

Y

is a deterministic function of the input

X

. We demonstrate three caveats when using IB in any situation where

Y

is a deterministic function of

X

: (1) the IB curve cannot be recovered by maximizing the IB Lagrangian for different values of

\beta

; (2) there are "uninteresting" trivial solutions at all points of the IB curve; and (3) for multi-layer classifiers that achieve low prediction error, different layers cannot exhibit a strict trade-off between compression and prediction, contrary to a recent proposal. We also show that when

Y

is a small perturbation away from being a deterministic function of

X

, these three caveats arise in an approximate way. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We demonstrate the three caveats on the MNIST dataset

arXiv.org e-Print Archive

Information based clustering

Author: Ashburner
Bowers
Brown
Eisen
G. S. Atwal
G. Tkacik
Gasch
N. Slonim
Segal
W. Bialek
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 25/11/2005
Field of study

In an age of increasingly large data sets, investigators in many different disciplines have turned to clustering as a tool for data analysis and exploration. Existing clustering methods, however, typically depend on several nontrivial assumptions about the structure of data. Here we reformulate the clustering problem from an information theoretic perspective which avoids many of these assumptions. In particular, our formulation obviates the need for defining a cluster "prototype", does not require an a priori similarity metric, is invariant to changes in the representation of the data, and naturally captures non-linear relations. We apply this approach to different domains and find that it consistently produces clusters that are more coherent than those extracted by existing algorithms. Finally, our approach provides a way of clustering based on collective notions of similarity rather than the traditional pairwise measures.Comment: To appear in Proceedings of the National Academy of Sciences USA, 11 pages, 9 figure

arXiv.org e-Print Archive

Crossref

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

Probabilistic Clustering Using Maximal Matrix Norm Couplings

Author: Makur Anuran
Qiu David
Zheng Lizhong
Publication venue
Publication date: 10/10/2018
Field of study

In this paper, we present a local information theoretic approach to explicitly learn probabilistic clustering of a discrete random variable. Our formulation yields a convex maximization problem for which it is NP-hard to find the global optimum. In order to algorithmically solve this optimization problem, we propose two relaxations that are solved via gradient ascent and alternating maximization. Experiments on the MSR Sentence Completion Challenge, MovieLens 100K, and Reuters21578 datasets demonstrate that our approach is competitive with existing techniques and worthy of further investigation.Comment: Presented at 56th Annual Allerton Conference on Communication, Control, and Computing, 201

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Clustering Boolean Tensors

Author: Metzler Saskia
Miettinen Pauli
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Tensor factorizations are computationally hard problems, and in particular, are often significantly harder than their matrix counterparts. In case of Boolean tensor factorizations -- where the input tensor and all the factors are required to be binary and we use Boolean algebra -- much of that hardness comes from the possibility of overlapping components. Yet, in many applications we are perfectly happy to partition at least one of the modes. In this paper we investigate what consequences does this partitioning have on the computational complexity of the Boolean tensor factorizations and present a new algorithm for the resulting clustering problem. This algorithm can alternatively be seen as a particularly regularized clustering algorithm that can handle extremely high-dimensional observations. We analyse our algorithms with the goal of maximizing the similarity and argue that this is more meaningful than minimizing the dissimilarity. As a by-product we obtain a PTAS and an efficient 0.828-approximation algorithm for rank-1 binary factorizations. Our algorithm for Boolean tensor clustering achieves high scalability, high similarity, and good generalization to unseen data with both synthetic and real-world data sets

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe