9 research outputs found

    On the Theoretical Equivalence of Several Trade-Off Curves Assessing Statistical Proximity

    Full text link
    The recent advent of powerful generative models has triggered the renewed development of quantitative measures to assess the proximity of two probability distributions. As the scalar Frechet inception distance remains popular, several methods have explored computing entire curves, which reveal the trade-off between the fidelity and variability of the first distribution with respect to the second one. Several of such variants have been proposed independently and while intuitively similar, their relationship has not yet been made explicit. In an effort to make the emerging picture of generative evaluation more clear, we propose a unification of four curves known respectively as: the precision-recall (PR) curve, the Lorenz curve, the receiver operating characteristic (ROC) curve and a special case of R\'enyi divergence frontiers. In addition, we discuss possible links between PR / Lorenz curves with the derivation of domain adaptation bounds.Comment: 10 pages, 3 figure

    Ranking Neural Checkpoints

    Full text link
    This paper is concerned with ranking many pre-trained deep neural networks (DNNs), called checkpoints, for the transfer learning to a downstream task. Thanks to the broad use of DNNs, we may easily collect hundreds of checkpoints from various sources. Which of them transfers the best to our downstream task of interest? Striving to answer this question thoroughly, we establish a neural checkpoint ranking benchmark (NeuCRaB) and study some intuitive ranking measures. These measures are generic, applying to the checkpoints of different output types without knowing how the checkpoints are pre-trained on which dataset. They also incur low computation cost, making them practically meaningful. Our results suggest that the linear separability of the features extracted by the checkpoints is a strong indicator of transferability. We also arrive at a new ranking measure, NLEEP, which gives rise to the best performance in the experiments.Comment: Accepted to CVPR 202

    Lower Bounds for R\'enyi Differential Privacy in a Black-Box Setting

    Full text link
    We present new methods for assessing the privacy guarantees of an algorithm with regard to R\'enyi Differential Privacy. To the best of our knowledge, this work is the first to address this problem in a black-box scenario, where only algorithmic outputs are available. To quantify privacy leakage, we devise a new estimator for the R\'enyi divergence of a pair of output distributions. This estimator is transformed into a statistical lower bound that is proven to hold for large samples with high probability. Our method is applicable for a broad class of algorithms, including many well-known examples from the privacy literature. We demonstrate the effectiveness of our approach by experiments encompassing algorithms and privacy enhancing methods that have not been considered in related works

    Performative Prediction with Bandit Feedback: Learning through Reparameterization

    Full text link
    Performative prediction, as introduced by Perdomo et al. (2020), is a framework for studying social prediction in which the data distribution itself changes in response to the deployment of a model. Existing work on optimizing accuracy in this setting hinges on two assumptions that are easily violated in practice: that the performative risk is convex over the deployed model, and that the mapping from the model to the data distribution is known to the model designer in advance. In this paper, we initiate the study of tractable performative prediction problems that do not require these assumptions. To tackle this more challenging setting, we develop a two-level zeroth-order optimization algorithm, where one level aims to compute the distribution map, and the other level reparameterizes the performative prediction objective as a function of the induced data distribution. Under mild conditions, this reparameterization allows us to transform the non-convex objective into a convex one and achieve provable regret guarantees. In particular, we provide a regret bound that is sublinear in the total number of performative samples taken and only polynomial in the dimension of the model parameter

    On the Properties of Kullback-Leibler Divergence Between Multivariate Gaussian Distributions

    Full text link
    Kullback-Leibler (KL) divergence is one of the most important divergence measures between probability distributions. In this paper, we prove several properties of KL divergence between multivariate Gaussian distributions. First, for any two nn-dimensional Gaussian distributions N1\mathcal{N}_1 and N2\mathcal{N}_2, we give the supremum of KL(N1∣∣N2)KL(\mathcal{N}_1||\mathcal{N}_2) when KL(N2∣∣N1)≀Δ (Δ>0)KL(\mathcal{N}_2||\mathcal{N}_1)\leq \varepsilon\ (\varepsilon>0). For small Δ\varepsilon, we show that the supremum is Δ+2Δ1.5+O(Δ2)\varepsilon + 2\varepsilon^{1.5} + O(\varepsilon^2). This quantifies the approximate symmetry of small KL divergence between Gaussians. We also find the infimum of KL(N1∣∣N2)KL(\mathcal{N}_1||\mathcal{N}_2) when KL(N2∣∣N1)≄M (M>0)KL(\mathcal{N}_2||\mathcal{N}_1)\geq M\ (M>0). We give the conditions when the supremum and infimum can be attained. Second, for any three nn-dimensional Gaussians N1\mathcal{N}_1, N2\mathcal{N}_2, and N3\mathcal{N}_3, we find an upper bound of KL(N1∣∣N3)KL(\mathcal{N}_1||\mathcal{N}_3) if KL(N1∣∣N2)≀Δ1KL(\mathcal{N}_1||\mathcal{N}_2)\leq \varepsilon_1 and KL(N2∣∣N3)≀Δ2KL(\mathcal{N}_2||\mathcal{N}_3)\leq \varepsilon_2 for Δ1,Δ2≄0\varepsilon_1,\varepsilon_2\ge 0. For small Δ1\varepsilon_1 and Δ2\varepsilon_2, we show the upper bound is 3Δ1+3Δ2+2Δ1Δ2+o(Δ1)+o(Δ2)3\varepsilon_1+3\varepsilon_2+2\sqrt{\varepsilon_1\varepsilon_2}+o(\varepsilon_1)+o(\varepsilon_2). This reveals that KL divergence between Gaussians follows a relaxed triangle inequality. Importantly, all the bounds in the theorems presented in this paper are independent of the dimension nn. Finally, We discuss the applications of our theorems in explaining counterintuitive phenomenon of flow-based model, deriving deep anomaly detection algorithm, and extending one-step robustness guarantee to multiple steps in safe reinforcement learning.Comment: arXiv admin note: text overlap with arXiv:2002.0332

    MAUVE Scores for Generative Models: Theory and Practice

    Full text link
    Generative artificial intelligence has made significant strides, producing text indistinguishable from human prose and remarkably photorealistic images. Automatically measuring how close the generated data distribution is to the target distribution is central to diagnosing existing models and developing better ones. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore three approaches to statistically estimate these scores: vector quantization, non-parametric estimation, and classifier-based estimation. We provide statistical bounds for the vector quantization approach. Empirically, we find that the proposed scores paired with a range of ff-divergences and statistical estimation methods can quantify the gaps between the distributions of human-written text and those of modern neural language models by correlating with human judgments and identifying known properties of the generated texts. We demonstrate in the vision domain that MAUVE can identify known properties of generated images on par with or better than existing metrics. In conclusion, we present practical recommendations for using MAUVE effectively with language and image modalities.Comment: Published in Journal of Machine Learning Researc

    Learning Identifiable Representations: Independent Influences and Multiple Views

    Get PDF
    Intelligent systems, whether biological or artificial, perceive unstructured information from the world around them: deep neural networks designed for object recognition receive collections of pixels as inputs; living beings capture visual stimuli through photoreceptors that convert incoming light into electrical signals. Sophisticated signal processing is required to extract meaningful features (e.g., the position, dimension, and colour of objects in an image) from these inputs: this motivates the field of representation learning. But what features should be deemed meaningful, and how to learn them? We will approach these questions based on two metaphors. The first one is the cocktail-party problem, where a number of conversations happen in parallel in a room, and the task is to recover (or separate) the voices of the individual speakers from recorded mixtures—also termed blind source separation. The second one is what we call the independent-listeners problem: given two listeners in front of some loudspeakers, the question is whether, when processing what they hear, they will make the same information explicit, identifying similar constitutive elements. The notion of identifiability is crucial when studying these problems, as it specifies suitable technical assumptions under which representations are uniquely determined, up to tolerable ambiguities like latent source reordering. A key result of this theory is that, when the mixing is nonlinear, the model is provably non-identifiable. A first question is, therefore, under what additional assumptions (ideally as mild as possible) the problem becomes identifiable; a second one is, what algorithms can be used to estimate the model. The contributions presented in this thesis address these questions and revolve around two main principles. The first principle is to learn representation where the latent components influence the observations independently. Here the term “independently” is used in a non-statistical sense—which can be loosely thought of as absence of fine-tuning between distinct elements of a generative process. The second principle is that representations can be learned from paired observations or views, where mixtures of the same latent variables are observed, and they (or a subset thereof) are perturbed in one of the views—also termed multi-view setting. I will present work characterizing these two problem settings, studying their identifiability and proposing suitable estimation algorithms. Moreover, I will discuss how the success of popular representation learning methods may be explained in terms of the principles above and describe an application of the second principle to the statistical analysis of group studies in neuroimaging
    corecore