31,616 research outputs found
DiME: Maximizing Mutual Information by a Difference of Matrix-Based Entropies
We introduce an information-theoretic quantity with similar properties to
mutual information that can be estimated from data without making explicit
assumptions on the underlying distribution. This quantity is based on a
recently proposed matrix-based entropy that uses the eigenvalues of a
normalized Gram matrix to compute an estimate of the eigenvalues of an
uncentered covariance operator in a reproducing kernel Hilbert space. We show
that a difference of matrix-based entropies (DiME) is well suited for problems
involving the maximization of mutual information between random variables.
While many methods for such tasks can lead to trivial solutions, DiME naturally
penalizes such outcomes. We compare DiME to several baseline estimators of
mutual information on a toy Gaussian dataset. We provide examples of use cases
for DiME, such as latent factor disentanglement and a multiview representation
learning problem where DiME is used to learn a shared representation among
views with high mutual information
Unsupervised Visual Representation Learning via Mutual Information Regularized Assignment
This paper proposes Mutual Information Regularized Assignment (MIRA), a
pseudo-labeling algorithm for unsupervised representation learning inspired by
information maximization. We formulate online pseudo-labeling as an
optimization problem to find pseudo-labels that maximize the mutual information
between the label and data while being close to a given model probability. We
derive a fixed-point iteration method and prove its convergence to the optimal
solution. In contrast to baselines, MIRA combined with pseudo-label prediction
enables a simple yet effective clustering-based representation learning without
incorporating extra training techniques or artificial constraints such as
sampling strategy, equipartition constraints, etc. With relatively small
training epochs, representation learned by MIRA achieves state-of-the-art
performance on various downstream tasks, including the linear/k-NN evaluation
and transfer learning. Especially, with only 400 epochs, our method applied to
ImageNet dataset with ResNet-50 architecture achieves 75.6% linear evaluation
accuracy.Comment: NeurIPS 202
Enhanced Multimodal Representation Learning with Cross-modal KD
This paper explores the tasks of leveraging auxiliary modalities which are
only available at training to enhance multimodal representation learning
through cross-modal Knowledge Distillation (KD). The widely adopted mutual
information maximization-based objective leads to a short-cut solution of the
weak teacher, i.e., achieving the maximum mutual information by simply making
the teacher model as weak as the student model. To prevent such a weak
solution, we introduce an additional objective term, i.e., the mutual
information between the teacher and the auxiliary modality model. Besides, to
narrow down the information gap between the student and teacher, we further
propose to minimize the conditional entropy of the teacher given the student.
Novel training schemes based on contrastive learning and adversarial learning
are designed to optimize the mutual information and the conditional entropy,
respectively. Experimental results on three popular multimodal benchmark
datasets have shown that the proposed method outperforms a range of
state-of-the-art approaches for video recognition, video retrieval and emotion
classification.Comment: Accepted by CVPR202
- …