Search CORE

38,831 research outputs found

Caveats for information bottleneck in deterministic scenarios

Author: Kolchinsky Artemy
Tracey Brendan D.
Van Kuyk Steven
Publication venue
Publication date: 08/02/2019
Field of study

Information bottleneck (IB) is a method for extracting information from one random variable

X

that is relevant for predicting another random variable

Y

. To do so, IB identifies an intermediate "bottleneck" variable

T

that has low mutual information

I(X;T)

and high mutual information

I(Y;T)

. The "IB curve" characterizes the set of bottleneck variables that achieve maximal

I(Y;T)

for a given

I(X;T)

, and is typically explored by maximizing the "IB Lagrangian",

I(Y;T) - \beta I(X;T)

. In some cases,

Y

is a deterministic function of

X

, including many classification problems in supervised learning where the output class

Y

is a deterministic function of the input

X

. We demonstrate three caveats when using IB in any situation where

Y

is a deterministic function of

X

: (1) the IB curve cannot be recovered by maximizing the IB Lagrangian for different values of

\beta

; (2) there are "uninteresting" trivial solutions at all points of the IB curve; and (3) for multi-layer classifiers that achieve low prediction error, different layers cannot exhibit a strict trade-off between compression and prediction, contrary to a recent proposal. We also show that when

Y

is a small perturbation away from being a deterministic function of

X

, these three caveats arise in an approximate way. To address problem (1), we propose a functional that, unlike the IB Lagrangian, can recover the IB curve in all cases. We demonstrate the three caveats on the MNIST dataset

arXiv.org e-Print Archive