Search CORE

23 research outputs found

Understanding the limitation of Total Correlation Estimation Based on Mutual Information Bounds

Author: Chen Zihao
Publication venue
Publication date: 26/04/2023
Field of study

The total correlation(TC) is a crucial index to measure the correlation between marginal distribution in multidimensional random variables, and it is frequently applied as an inductive bias in representation learning. Previous research has shown that the TC value can be estimated using mutual information boundaries through decomposition. However, we found through theoretical derivation and qualitative experiments that due to the use of importance sampling in the decomposition process, the bias of TC value estimated based on MI bounds will be amplified when the proposal distribution in the sampling differs significantly from the target distribution. To reduce estimation bias issues, we propose a TC estimation correction model based on supervised learning, which uses the training iteration loss sequence of the TC estimator based on MI bounds as input features to output the true TC value. Experiments show that our proposed method can improve the accuracy of TC estimation and eliminate the variance generated by the TC estimation process

arXiv.org e-Print Archive

Regularized Mutual Information Neural Estimation

Author: Choi Kwanghee
Lee Siyeong
Publication venue
Publication date: 16/11/2020
Field of study

With the variational lower bound of mutual information (MI), the estimation of MI can be understood as an optimization task via stochastic gradient descent. In this work, we start by showing how Mutual Information Neural Estimator (MINE) searches for the optimal function

T

that maximizes the Donsker-Varadhan representation. With our synthetic dataset, we directly observe the neural network outputs during the optimization to investigate why MINE succeeds or fails: We discover the drifting phenomenon, where the constant term of

T

is shifting through the optimization process, and analyze the instability caused by the interaction between the

logsumexp

and the insufficient batch size. Next, through theoretical and experimental evidence, we propose a novel lower bound that effectively regularizes the neural network to alleviate the problems of MINE. We also introduce an averaging strategy that produces an unbiased estimate by utilizing multiple batches to mitigate the batch size limitation. Finally, we show that

L^2

regularization achieves significant improvements in both discrete and continuous settings.Comment: 18 pages, 15 figur

arXiv.org e-Print Archive