We introduce an information-theoretic quantity with similar properties to
mutual information that can be estimated from data without making explicit
assumptions on the underlying distribution. This quantity is based on a
recently proposed matrix-based entropy that uses the eigenvalues of a
normalized Gram matrix to compute an estimate of the eigenvalues of an
uncentered covariance operator in a reproducing kernel Hilbert space. We show
that a difference of matrix-based entropies (DiME) is well suited for problems
involving the maximization of mutual information between random variables.
While many methods for such tasks can lead to trivial solutions, DiME naturally
penalizes such outcomes. We compare DiME to several baseline estimators of
mutual information on a toy Gaussian dataset. We provide examples of use cases
for DiME, such as latent factor disentanglement and a multiview representation
learning problem where DiME is used to learn a shared representation among
views with high mutual information