161 research outputs found
Max-Sliced Mutual Information
Quantifying the dependence between high-dimensional random variables is
central to statistical learning and inference. Two classical methods are
canonical correlation analysis (CCA), which identifies maximally correlated
projected versions of the original variables, and Shannon's mutual information,
which is a universal dependence measure that also captures high-order
dependencies. However, CCA only accounts for linear dependence, which may be
insufficient for certain applications, while mutual information is often
infeasible to compute/estimate in high dimensions. This work proposes a middle
ground in the form of a scalable information-theoretic generalization of CCA,
termed max-sliced mutual information (mSMI). mSMI equals the maximal mutual
information between low-dimensional projections of the high-dimensional
variables, which reduces back to CCA in the Gaussian case. It enjoys the best
of both worlds: capturing intricate dependencies in the data while being
amenable to fast computation and scalable estimation from samples. We show that
mSMI retains favorable structural properties of Shannon's mutual information,
like variational forms and identification of independence. We then study
statistical estimation of mSMI, propose an efficiently computable neural
estimator, and couple it with formal non-asymptotic error bounds. We present
experiments that demonstrate the utility of mSMI for several tasks,
encompassing independence testing, multi-view representation learning,
algorithmic fairness, and generative modeling. We observe that mSMI
consistently outperforms competing methods with little-to-no computational
overhead.Comment: Accepted at NeurIPS 202
Recommended from our members
New computational and statistical characterizations of neural network learning
A foundational goal of machine learning theory is to characterize the inherent computational and statistical complexity of some of the most basic tasks in machine learning. In this thesis, we present new results concerning two such tasks in neural network learning and beyond. First, we study the question of when efficient algorithms can achieve high test accuracy on labeled data known to be consistent with a simple neural network. We present a set of results establishing the surprising computational intractability of this problem even in the benign setting where the inputs are drawn from a Gaussian, and the labels are perfectly consistent with a simple two-hidden-layer or even one-hidden-layer neural network. These hardness results illuminate what types of problem assumptions are necessary for efficient algorithms for this problem to be possible at all. Next, we investigate the problem of testing whether a learning algorithm has fit the data as well as its guarantee claims. This is a serious issue for agnostic supervised learning (i.e. supervised learning with no assumptions on the labels), where most efficient algorithms make simplifying distributional assumptions such as Gaussianity. But such assumptions can be hard to verify, meaning it can be hard to check whether the learner has actually succeeded. The recent elegant model of testable learning addresses this issue by replacing such hard-to-verify distributional assumptions with efficiently testable ones. We present both a broad algorithmic framework as well as a full statistical characterization of this model.Computer Science
Privacy Preserving Domain Adaptation for Semantic Segmentation of Medical Images
Convolutional neural networks (CNNs) have led to significant improvements in
tasks involving semantic segmentation of images. CNNs are vulnerable in the
area of biomedical image segmentation because of distributional gap between two
source and target domains with different data modalities which leads to domain
shift. Domain shift makes data annotations in new modalities necessary because
models must be retrained from scratch. Unsupervised domain adaptation (UDA) is
proposed to adapt a model to new modalities using solely unlabeled target
domain data. Common UDA algorithms require access to data points in the source
domain which may not be feasible in medical imaging due to privacy concerns. In
this work, we develop an algorithm for UDA in a privacy-constrained setting,
where the source domain data is inaccessible. Our idea is based on encoding the
information from the source samples into a prototypical distribution that is
used as an intermediate distribution for aligning the target domain
distribution with the source domain distribution. We demonstrate the
effectiveness of our algorithm by comparing it to state-of-the-art medical
image semantic segmentation approaches on two medical image semantic
segmentation datasets
- …