386 research outputs found
DiME: Maximizing Mutual Information by a Difference of Matrix-Based Entropies
We introduce an information-theoretic quantity with similar properties to
mutual information that can be estimated from data without making explicit
assumptions on the underlying distribution. This quantity is based on a
recently proposed matrix-based entropy that uses the eigenvalues of a
normalized Gram matrix to compute an estimate of the eigenvalues of an
uncentered covariance operator in a reproducing kernel Hilbert space. We show
that a difference of matrix-based entropies (DiME) is well suited for problems
involving the maximization of mutual information between random variables.
While many methods for such tasks can lead to trivial solutions, DiME naturally
penalizes such outcomes. We compare DiME to several baseline estimators of
mutual information on a toy Gaussian dataset. We provide examples of use cases
for DiME, such as latent factor disentanglement and a multiview representation
learning problem where DiME is used to learn a shared representation among
views with high mutual information
Information Theoretic Representation Distillation
Despite the empirical success of knowledge distillation, current
state-of-the-art methods are computationally expensive to train, which makes
them difficult to adopt in practice. To address this problem, we introduce two
distinct complementary losses inspired by a cheap entropy-like estimator. These
losses aim to maximise the correlation and mutual information between the
student and teacher representations. Our method incurs significantly less
training overheads than other approaches and achieves competitive performance
to the state-of-the-art on the knowledge distillation and cross-model transfer
tasks. We further demonstrate the effectiveness of our method on a binary
distillation task, whereby it leads to a new state-of-the-art for binary
quantisation and approaches the performance of a full precision model. Code:
www.github.com/roymiles/ITRDComment: BMVC 202
Nonparametric estimation of the characteristic triplet of a discretely observed L\'evy process
Given a discrete time sample from a L\'evy process
of a finite jump activity, we study the problem of
nonparametric estimation of the characteristic triplet
corresponding to the process Based on Fourier inversion and kernel
smoothing, we propose estimators of and and study
their asymptotic behaviour. The obtained results include derivation of upper
bounds on the mean square error of the estimators of and
and an upper bound on the mean integrated square error of an estimator of
Comment: 29 page
Optimal Transport for Measures with Noisy Tree Metric
We study optimal transport (OT) problem for probability measures supported on
a tree metric space. It is known that such OT problem (i.e., tree-Wasserstein
(TW)) admits a closed-form expression, but depends fundamentally on the
underlying tree structure over supports of input measures. In practice, the
given tree structure may be, however, perturbed due to noisy or adversarial
measurements. To mitigate this issue, we follow the max-min robust OT approach
which considers the maximal possible distances between two input measures over
an uncertainty set of tree metrics. In general, this approach is hard to
compute, even for measures supported in one-dimensional space, due to its
non-convexity and non-smoothness which hinders its practical applications,
especially for large-scale settings. In this work, we propose novel uncertainty
sets of tree metrics from the lens of edge deletion/addition which covers a
diversity of tree structures in an elegant framework. Consequently, by building
upon the proposed uncertainty sets, and leveraging the tree structure over
supports, we show that the robust OT also admits a closed-form expression for a
fast computation as its counterpart standard OT (i.e., TW). Furthermore, we
demonstrate that the robust OT satisfies the metric property and is negative
definite. We then exploit its negative definiteness to propose positive
definite kernels and test them in several simulations on various real-world
datasets on document classification and topological data analysis.Comment: To appear in AISTATS 202
Simple stopping criteria for information theoretic feature selection
Feature selection aims to select the smallest feature subset that yields the
minimum generalization error. In the rich literature in feature selection,
information theory-based approaches seek a subset of features such that the
mutual information between the selected features and the class labels is
maximized. Despite the simplicity of this objective, there still remain several
open problems in optimization. These include, for example, the automatic
determination of the optimal subset size (i.e., the number of features) or a
stopping criterion if the greedy searching strategy is adopted. In this paper,
we suggest two stopping criteria by just monitoring the conditional mutual
information (CMI) among groups of variables. Using the recently developed
multivariate matrix-based Renyi's \alpha-entropy functional, which can be
directly estimated from data samples, we showed that the CMI among groups of
variables can be easily computed without any decomposition or approximation,
hence making our criteria easy to implement and seamlessly integrated into any
existing information theoretic feature selection methods with a greedy search
strategy.Comment: Paper published in the journal of Entrop
- …