Search CORE

9 research outputs found

On the Theoretical Equivalence of Several Trade-Off Curves Assessing Statistical Proximity

Author: Rabin Julien
Simon Loic
Siry Rodrigue
Webster Ryan
Publication venue
Publication date: 04/02/2021
Field of study

The recent advent of powerful generative models has triggered the renewed development of quantitative measures to assess the proximity of two probability distributions. As the scalar Frechet inception distance remains popular, several methods have explored computing entire curves, which reveal the trade-off between the fidelity and variability of the first distribution with respect to the second one. Several of such variants have been proposed independently and while intuitively similar, their relationship has not yet been made explicit. In an effort to make the emerging picture of generative evaluation more clear, we propose a unification of four curves known respectively as: the precision-recall (PR) curve, the Lorenz curve, the receiver operating characteristic (ROC) curve and a special case of R\'enyi divergence frontiers. In addition, we discuss possible links between PR / Lorenz curves with the derivation of domain adaptation bounds.Comment: 10 pages, 3 figure

arXiv.org e-Print Archive

HAL - Normandie Université

HAL Descartes

Ranking Neural Checkpoints

Author: Gong Boqing
Green Bradley
Jia Xuhui
Li Yandong
Sang Ruoxin
Wang Liqiang
Zhu Yukun
Publication venue
Publication date: 31/03/2021
Field of study

This paper is concerned with ranking many pre-trained deep neural networks (DNNs), called checkpoints, for the transfer learning to a downstream task. Thanks to the broad use of DNNs, we may easily collect hundreds of checkpoints from various sources. Which of them transfers the best to our downstream task of interest? Striving to answer this question thoroughly, we establish a neural checkpoint ranking benchmark (NeuCRaB) and study some intuitive ranking measures. These measures are generic, applying to the checkpoints of different output types without knowing how the checkpoints are pre-trained on which dataset. They also incur low computation cost, making them practically meaningful. Our results suggest that the linear separability of the features extracted by the checkpoints is a strong indicator of transferability. We also arrive at a new ranking measure, NLEEP, which gives rise to the best performance in the experiments.Comment: Accepted to CVPR 202

arXiv.org e-Print Archive

Lower Bounds for R\'enyi Differential Privacy in a Black-Box Setting

Author: Askin Önder
Dunsche Martin
Kutta Tim
Publication venue
Publication date: 09/12/2022
Field of study

We present new methods for assessing the privacy guarantees of an algorithm with regard to R\'enyi Differential Privacy. To the best of our knowledge, this work is the first to address this problem in a black-box scenario, where only algorithmic outputs are available. To quantify privacy leakage, we devise a new estimator for the R\'enyi divergence of a pair of output distributions. This estimator is transformed into a statistical lower bound that is proven to hold for large samples with high probability. Our method is applicable for a broad class of algorithms, including many well-known examples from the privacy literature. We demonstrate the effectiveness of our approach by experiments encompassing algorithms and privacy enhancing methods that have not been considered in related works

arXiv.org e-Print Archive

Recommended from our members

Advances in Latent Variable and Causal Models

Author: Rubenstein Paul
Publication venue: University of Cambridge
Publication date: 01/02/2020
Field of study

This thesis considers three different areas of machine learning concerned with the modelling of data, extending theoretical understanding in each of them. First, the estimation of f- divergences is considered in a setting that is naturally satisfied in the context of autoencoders. By exploiting structural assumptions on the distributions of concern, the proposed estimator is shown to exhibit fast rates of concentration and bias-decay. In contrast, in much of the existing f-divergence estimation literature, fast rates are only obtainable under strong conditions that are difficult to verify in practice. Next, novel identifiability results are presented for nonlinear Independent Component Analysis (ICA) in a multi-view setting, extending the scarce literature of known identifiability results for nonlinear ICA. A result of particular note is that if one noiseless view of the sources is supplemented by a second view that is appropriately corrupted by source-level noise, the sources can be fully reconstructed from the observations up to tolerable ambiguities. This setting is applicable to areas such as neuroimaging, where multiple data modalities may be available. Finally, a framework is introduced to evaluate when two causal models are consistent with one another, meaning that a correspondence can be established between them such that reasoning about the effects of interventions in both models agree. This can be used to understand when two models of the same system at different levels of detail are consistent, and has application to the problem of causal variable definition. This work has broad implications to the causal modelling process in general, as there is often a mismatch between the level at which measurements are made and the level at which the underlying ‘true’ causal structure exists, yet causal inference algorithms generally seek to discover causal structure at the level of measurements

Apollo (Cambridge)

MPG.PuRe

Performative Prediction with Bandit Feedback: Learning through Reparameterization

Author: Chen Yatong
Ho Chien-Ju
Liu Yang
Tang Wei
Publication venue
Publication date: 01/05/2023
Field of study

Performative prediction, as introduced by Perdomo et al. (2020), is a framework for studying social prediction in which the data distribution itself changes in response to the deployment of a model. Existing work on optimizing accuracy in this setting hinges on two assumptions that are easily violated in practice: that the performative risk is convex over the deployed model, and that the mapping from the model to the data distribution is known to the model designer in advance. In this paper, we initiate the study of tractable performative prediction problems that do not require these assumptions. To tackle this more challenging setting, we develop a two-level zeroth-order optimization algorithm, where one level aims to compute the distribution map, and the other level reparameterizes the performative prediction objective as a function of the induced data distribution. Under mild conditions, this reparameterization allows us to transform the non-convex objective into a convex one and achieve provable regret guarantees. In particular, we provide a regret bound that is sublinear in the total number of performative samples taken and only polynomial in the dimension of the model parameter

arXiv.org e-Print Archive

On the Properties of Kullback-Leibler Divergence Between Multivariate Gaussian Distributions

Author: Chen Zhenbang
Li Kenli
Liu Wanwei
Wang Ji
Zhang Yufeng
Publication venue
Publication date: 22/01/2023
Field of study

Kullback-Leibler (KL) divergence is one of the most important divergence measures between probability distributions. In this paper, we prove several properties of KL divergence between multivariate Gaussian distributions. First, for any two

n

-dimensional Gaussian distributions

\mathcal{N}_1

and

\mathcal{N}_2

, we give the supremum of

KL(\mathcal{N}_1||\mathcal{N}_2)

when

KL(\mathcal{N}_2||\mathcal{N}_1)\leq \varepsilon\ (\varepsilon>0)

. For small

\varepsilon

, we show that the supremum is

\varepsilon + 2\varepsilon^{1.5} + O(\varepsilon^2)

. This quantifies the approximate symmetry of small KL divergence between Gaussians. We also find the infimum of

KL(\mathcal{N}_1||\mathcal{N}_2)

when

KL(\mathcal{N}_2||\mathcal{N}_1)\geq M\ (M>0)

. We give the conditions when the supremum and infimum can be attained. Second, for any three

n

-dimensional Gaussians

\mathcal{N}_1

\mathcal{N}_2

, and

\mathcal{N}_3

, we find an upper bound of

KL(\mathcal{N}_1||\mathcal{N}_3)

KL(\mathcal{N}_1||\mathcal{N}_2)\leq \varepsilon_1

and

KL(\mathcal{N}_2||\mathcal{N}_3)\leq \varepsilon_2

for

\varepsilon_1,\varepsilon_2\ge 0

. For small

\varepsilon_1

and

\varepsilon_2

, we show the upper bound is

3\varepsilon_1+3\varepsilon_2+2\sqrt{\varepsilon_1\varepsilon_2}+o(\varepsilon_1)+o(\varepsilon_2)

. This reveals that KL divergence between Gaussians follows a relaxed triangle inequality. Importantly, all the bounds in the theorems presented in this paper are independent of the dimension

n

. Finally, We discuss the applications of our theorems in explaining counterintuitive phenomenon of flow-based model, deriving deep anomaly detection algorithm, and extending one-step robustness guarantee to multiple steps in safe reinforcement learning.Comment: arXiv admin note: text overlap with arXiv:2002.0332

arXiv.org e-Print Archive

MAUVE Scores for Generative Models: Theory and Practice

Author: Choi Yejin
Harchaoui Zaid
Liu Lang
Oh Sewoong
Pillutla Krishna
Swayamdipta Swabha
Thickstun John
Welleck Sean
Zellers Rowan
Publication venue
Publication date: 07/12/2023
Field of study

Generative artificial intelligence has made significant strides, producing text indistinguishable from human prose and remarkably photorealistic images. Automatically measuring how close the generated data distribution is to the target distribution is central to diagnosing existing models and developing better ones. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore three approaches to statistically estimate these scores: vector quantization, non-parametric estimation, and classifier-based estimation. We provide statistical bounds for the vector quantization approach. Empirically, we find that the proposed scores paired with a range of

f

-divergences and statistical estimation methods can quantify the gaps between the distributions of human-written text and those of modern neural language models by correlating with human judgments and identifying known properties of the generated texts. We demonstrate in the vision domain that MAUVE can identify known properties of generated images on par with or better than existing metrics. In conclusion, we present practical recommendations for using MAUVE effectively with language and image modalities.Comment: Published in Journal of Machine Learning Researc

arXiv.org e-Print Archive

Learning Identifiable Representations: Independent Influences and Multiple Views

Author: Gresele Luigi
Publication venue: Universität Tübingen
Publication date: 28/11/2023
Field of study

Intelligent systems, whether biological or artificial, perceive unstructured information from the world around them: deep neural networks designed for object recognition receive collections of pixels as inputs; living beings capture visual stimuli through photoreceptors that convert incoming light into electrical signals. Sophisticated signal processing is required to extract meaningful features (e.g., the position, dimension, and colour of objects in an image) from these inputs: this motivates the field of representation learning. But what features should be deemed meaningful, and how to learn them? We will approach these questions based on two metaphors. The first one is the cocktail-party problem, where a number of conversations happen in parallel in a room, and the task is to recover (or separate) the voices of the individual speakers from recorded mixtures—also termed blind source separation. The second one is what we call the independent-listeners problem: given two listeners in front of some loudspeakers, the question is whether, when processing what they hear, they will make the same information explicit, identifying similar constitutive elements. The notion of identifiability is crucial when studying these problems, as it specifies suitable technical assumptions under which representations are uniquely determined, up to tolerable ambiguities like latent source reordering. A key result of this theory is that, when the mixing is nonlinear, the model is provably non-identifiable. A first question is, therefore, under what additional assumptions (ideally as mild as possible) the problem becomes identifiable; a second one is, what algorithms can be used to estimate the model. The contributions presented in this thesis address these questions and revolve around two main principles. The first principle is to learn representation where the latent components influence the observations independently. Here the term “independently” is used in a non-statistical sense—which can be loosely thought of as absence of fine-tuning between distinct elements of a generative process. The second principle is that representations can be learned from paired observations or views, where mixtures of the same latent variables are observed, and they (or a subset thereof) are perturbed in one of the views—also termed multi-view setting. I will present work characterizing these two problem settings, studying their identifiability and proposing suitable estimation algorithms. Moreover, I will discuss how the success of popular representation learning methods may be explained in terms of the principles above and describe an application of the second principle to the statistical analysis of group studies in neuroimaging

Publikationsserver der Universität Tübingen