Search CORE

33,212 research outputs found

Caveats for information bottleneck in deterministic scenarios

Author: Kolchinsky Artemy
Tracey Brendan D.
Van Kuyk Steven
Publication venue
Publication date: 08/02/2019
Field of study

Information bottleneck (IB) is a method for extracting information from one random variable

X

that is relevant for predicting another random variable

Y

. To do so, IB identifies an intermediate "bottleneck" variable

T

that has low mutual information

I(X;T)

and high mutual information

I(Y;T)

. The "IB curve" characterizes the set of bottleneck variables that achieve maximal

I(Y;T)

for a given

I(X;T)

, and is typically explored by maximizing the "IB Lagrangian",

I(Y;T) - \beta I(X;T)

. In some cases,

Y

is a deterministic function of

X

, including many classification problems in supervised learning where the output class

Y

is a deterministic function of the input

X

. We demonstrate three caveats when using IB in any situation where

Y

is a deterministic function of

X

: (1) the IB curve cannot be recovered by maximizing the IB Lagrangian for different values of

\beta

; (2) there are "uninteresting" trivial solutions at all points of the IB curve; and (3) for multi-layer classifiers that achieve low prediction error, different layers cannot exhibit a strict trade-off between compression and prediction, contrary to a recent proposal. We also show that when

Y

is a small perturbation away from being a deterministic function of

X

, these three caveats arise in an approximate way. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We demonstrate the three caveats on the MNIST dataset

arXiv.org e-Print Archive

Pareto-optimal clustering with the primal deterministic information bottleneck

Author: Chuang Isaac L.
Tan Andrew K.
Tegmark Max
Publication venue
Publication date: 05/04/2022
Field of study

At the heart of both lossy compression and clustering is a trade-off between the fidelity and size of the learned representation. Our goal is to map out and study the Pareto frontier that quantifies this trade-off. We focus on the Deterministic Information Bottleneck (DIB) formulation of lossy compression, which can be interpreted as a clustering problem. To this end, we introduce the {\it primal} DIB problem, which we show results in a much richer frontier than its previously studied dual counterpart. We present an algorithm for mapping out the Pareto frontier of the primal DIB trade-off that is also applicable to most other two-objective clustering problems. We study general properties of the Pareto frontier, and give both analytic and numerical evidence for logarithmic sparsity of the frontier in general. We provide evidence that our algorithm has polynomial scaling despite the super-exponential search space; and additionally propose a modification to the algorithm that can be used where sampling noise is expected to be significant. Finally, we use our algorithm to map the DIB frontier of three different tasks: compressing the English alphabet, extracting informative color classes from natural images, and compressing a group theory inspired dataset, revealing interesting features of frontier, and demonstrating how the structure of the frontier can be used for model selection with a focus on points previously hidden by the cloak of the convex hull.Comment: 26 pages, 11 figure

arXiv.org e-Print Archive

DSpace@MIT

Directory of Open Access Journals

PubMed Central