23 research outputs found
Understanding the limitation of Total Correlation Estimation Based on Mutual Information Bounds
The total correlation(TC) is a crucial index to measure the correlation
between marginal distribution in multidimensional random variables, and it is
frequently applied as an inductive bias in representation learning. Previous
research has shown that the TC value can be estimated using mutual information
boundaries through decomposition. However, we found through theoretical
derivation and qualitative experiments that due to the use of importance
sampling in the decomposition process, the bias of TC value estimated based on
MI bounds will be amplified when the proposal distribution in the sampling
differs significantly from the target distribution. To reduce estimation bias
issues, we propose a TC estimation correction model based on supervised
learning, which uses the training iteration loss sequence of the TC estimator
based on MI bounds as input features to output the true TC value. Experiments
show that our proposed method can improve the accuracy of TC estimation and
eliminate the variance generated by the TC estimation process
Regularized Mutual Information Neural Estimation
With the variational lower bound of mutual information (MI), the estimation
of MI can be understood as an optimization task via stochastic gradient
descent. In this work, we start by showing how Mutual Information Neural
Estimator (MINE) searches for the optimal function that maximizes the
Donsker-Varadhan representation. With our synthetic dataset, we directly
observe the neural network outputs during the optimization to investigate why
MINE succeeds or fails: We discover the drifting phenomenon, where the constant
term of is shifting through the optimization process, and analyze the
instability caused by the interaction between the and the
insufficient batch size. Next, through theoretical and experimental evidence,
we propose a novel lower bound that effectively regularizes the neural network
to alleviate the problems of MINE. We also introduce an averaging strategy that
produces an unbiased estimate by utilizing multiple batches to mitigate the
batch size limitation. Finally, we show that regularization achieves
significant improvements in both discrete and continuous settings.Comment: 18 pages, 15 figur