9 research outputs found
On the Theoretical Equivalence of Several Trade-Off Curves Assessing Statistical Proximity
The recent advent of powerful generative models has triggered the renewed
development of quantitative measures to assess the proximity of two probability
distributions. As the scalar Frechet inception distance remains popular,
several methods have explored computing entire curves, which reveal the
trade-off between the fidelity and variability of the first distribution with
respect to the second one. Several of such variants have been proposed
independently and while intuitively similar, their relationship has not yet
been made explicit. In an effort to make the emerging picture of generative
evaluation more clear, we propose a unification of four curves known
respectively as: the precision-recall (PR) curve, the Lorenz curve, the
receiver operating characteristic (ROC) curve and a special case of R\'enyi
divergence frontiers. In addition, we discuss possible links between PR /
Lorenz curves with the derivation of domain adaptation bounds.Comment: 10 pages, 3 figure
Ranking Neural Checkpoints
This paper is concerned with ranking many pre-trained deep neural networks
(DNNs), called checkpoints, for the transfer learning to a downstream task.
Thanks to the broad use of DNNs, we may easily collect hundreds of checkpoints
from various sources. Which of them transfers the best to our downstream task
of interest? Striving to answer this question thoroughly, we establish a neural
checkpoint ranking benchmark (NeuCRaB) and study some intuitive ranking
measures. These measures are generic, applying to the checkpoints of different
output types without knowing how the checkpoints are pre-trained on which
dataset. They also incur low computation cost, making them practically
meaningful. Our results suggest that the linear separability of the features
extracted by the checkpoints is a strong indicator of transferability. We also
arrive at a new ranking measure, NLEEP, which gives rise to the best
performance in the experiments.Comment: Accepted to CVPR 202
Lower Bounds for R\'enyi Differential Privacy in a Black-Box Setting
We present new methods for assessing the privacy guarantees of an algorithm
with regard to R\'enyi Differential Privacy. To the best of our knowledge, this
work is the first to address this problem in a black-box scenario, where only
algorithmic outputs are available. To quantify privacy leakage, we devise a new
estimator for the R\'enyi divergence of a pair of output distributions. This
estimator is transformed into a statistical lower bound that is proven to hold
for large samples with high probability. Our method is applicable for a broad
class of algorithms, including many well-known examples from the privacy
literature. We demonstrate the effectiveness of our approach by experiments
encompassing algorithms and privacy enhancing methods that have not been
considered in related works
Recommended from our members
Advances in Latent Variable and Causal Models
This thesis considers three different areas of machine learning concerned with the modelling of data, extending theoretical understanding in each of them. First, the estimation of f- divergences is considered in a setting that is naturally satisfied in the context of autoencoders. By exploiting structural assumptions on the distributions of concern, the proposed estimator is shown to exhibit fast rates of concentration and bias-decay. In contrast, in much of the existing f-divergence estimation literature, fast rates are only obtainable under strong conditions that are difficult to verify in practice. Next, novel identifiability results are presented for nonlinear Independent Component Analysis (ICA) in a multi-view setting, extending the scarce literature of known identifiability results for nonlinear ICA. A result of particular note is that if one noiseless view of the sources is supplemented by a second view that is appropriately corrupted by source-level noise, the sources can be fully reconstructed from the observations up to tolerable ambiguities. This setting is applicable to areas such as neuroimaging, where multiple data modalities may be available. Finally, a framework is introduced to evaluate when two causal models are consistent with one another, meaning that a correspondence can be established between them such that reasoning about the effects of interventions in both models agree. This can be used to understand when two models of the same system at different levels of detail are consistent, and has application to the problem of causal variable definition. This work has broad implications to the causal modelling process in general, as there is often a mismatch between the level at which measurements are made and the level at which the underlying âtrueâ causal structure exists, yet causal inference algorithms generally seek to discover causal structure at the level of measurements
Performative Prediction with Bandit Feedback: Learning through Reparameterization
Performative prediction, as introduced by Perdomo et al. (2020), is a
framework for studying social prediction in which the data distribution itself
changes in response to the deployment of a model. Existing work on optimizing
accuracy in this setting hinges on two assumptions that are easily violated in
practice: that the performative risk is convex over the deployed model, and
that the mapping from the model to the data distribution is known to the model
designer in advance. In this paper, we initiate the study of tractable
performative prediction problems that do not require these assumptions. To
tackle this more challenging setting, we develop a two-level zeroth-order
optimization algorithm, where one level aims to compute the distribution map,
and the other level reparameterizes the performative prediction objective as a
function of the induced data distribution. Under mild conditions, this
reparameterization allows us to transform the non-convex objective into a
convex one and achieve provable regret guarantees. In particular, we provide a
regret bound that is sublinear in the total number of performative samples
taken and only polynomial in the dimension of the model parameter
On the Properties of Kullback-Leibler Divergence Between Multivariate Gaussian Distributions
Kullback-Leibler (KL) divergence is one of the most important divergence
measures between probability distributions. In this paper, we prove several
properties of KL divergence between multivariate Gaussian distributions. First,
for any two -dimensional Gaussian distributions and
, we give the supremum of
when . For
small , we show that the supremum is . This quantifies the approximate
symmetry of small KL divergence between Gaussians. We also find the infimum of
when . We give the conditions when the supremum and infimum can be
attained. Second, for any three -dimensional Gaussians ,
, and , we find an upper bound of
if and for
. For small and
, we show the upper bound is
.
This reveals that KL divergence between Gaussians follows a relaxed triangle
inequality. Importantly, all the bounds in the theorems presented in this paper
are independent of the dimension . Finally, We discuss the applications of
our theorems in explaining counterintuitive phenomenon of flow-based model,
deriving deep anomaly detection algorithm, and extending one-step robustness
guarantee to multiple steps in safe reinforcement learning.Comment: arXiv admin note: text overlap with arXiv:2002.0332
MAUVE Scores for Generative Models: Theory and Practice
Generative artificial intelligence has made significant strides, producing
text indistinguishable from human prose and remarkably photorealistic images.
Automatically measuring how close the generated data distribution is to the
target distribution is central to diagnosing existing models and developing
better ones. We present MAUVE, a family of comparison measures between pairs of
distributions such as those encountered in the generative modeling of text or
images. These scores are statistical summaries of divergence frontiers
capturing two types of errors in generative modeling. We explore three
approaches to statistically estimate these scores: vector quantization,
non-parametric estimation, and classifier-based estimation. We provide
statistical bounds for the vector quantization approach.
Empirically, we find that the proposed scores paired with a range of
-divergences and statistical estimation methods can quantify the gaps
between the distributions of human-written text and those of modern neural
language models by correlating with human judgments and identifying known
properties of the generated texts. We demonstrate in the vision domain that
MAUVE can identify known properties of generated images on par with or better
than existing metrics. In conclusion, we present practical recommendations for
using MAUVE effectively with language and image modalities.Comment: Published in Journal of Machine Learning Researc
Learning Identifiable Representations: Independent Influences and Multiple Views
Intelligent systems, whether biological or artificial, perceive unstructured information from the world around them: deep neural networks designed for object recognition receive collections of pixels as inputs; living beings capture visual stimuli through photoreceptors that convert incoming light into electrical signals. Sophisticated signal processing is required to extract meaningful features (e.g., the position, dimension, and colour of objects in an image) from these inputs: this motivates the field of representation learning. But what features should be deemed meaningful, and how to learn them?
We will approach these questions based on two metaphors. The first one is the cocktail-party problem, where a number of conversations happen in parallel in a room, and the task is to recover (or separate) the voices of the individual speakers from recorded mixturesâalso termed blind source separation. The second one is what we call the independent-listeners problem: given two listeners in front of some loudspeakers, the question is whether, when processing what they hear, they will make the same information explicit, identifying similar constitutive elements. The notion of identifiability is crucial when studying these problems, as it specifies suitable technical assumptions under which representations are uniquely determined, up to tolerable ambiguities like latent source reordering. A key result of this theory is that, when the mixing is nonlinear, the model is provably non-identifiable. A first question is, therefore, under what additional assumptions (ideally as mild as possible) the problem becomes identifiable; a second one is, what algorithms can be used to estimate the model.
The contributions presented in this thesis address these questions and revolve around two main principles. The first principle is to learn representation where the latent components influence the observations independently. Here the term âindependentlyâ is used in a non-statistical senseâwhich can be loosely thought of as absence of fine-tuning between distinct elements of a generative process. The second principle is that representations can be learned from paired observations or views, where mixtures of the same latent variables are observed, and they (or a subset thereof) are perturbed in one of the viewsâalso termed multi-view setting. I will present work characterizing these two problem settings, studying their identifiability and proposing suitable estimation algorithms. Moreover, I will discuss how the success of popular representation learning methods may be explained in terms of the principles above and describe an application of the second principle to the statistical analysis of group studies in neuroimaging