26 research outputs found

    Statistical Aspects of Wasserstein Distances

    Full text link
    Wasserstein distances are metrics on probability distributions inspired by the problem of optimal mass transportation. Roughly speaking, they measure the minimal effort required to reconfigure the probability mass of one distribution in order to recover the other distribution. They are ubiquitous in mathematics, with a long history that has seen them catalyse core developments in analysis, optimization, and probability. Beyond their intrinsic mathematical richness, they possess attractive features that make them a versatile tool for the statistician: they can be used to derive weak convergence and convergence of moments, and can be easily bounded; they are well-adapted to quantify a natural notion of perturbation of a probability distribution; and they seamlessly incorporate the geometry of the domain of the distributions in question, thus being useful for contrasting complex objects. Consequently, they frequently appear in the development of statistical theory and inferential methodology, and have recently become an object of inference in themselves. In this review, we provide a snapshot of the main concepts involved in Wasserstein distances and optimal transportation, and a succinct overview of some of their many statistical aspects.Comment: Official version available at https://www.annualreviews.org/doi/full/10.1146/annurev-statistics-030718-10493

    Probabilistic Methods for Model Validation

    Get PDF
    This dissertation develops a probabilistic method for validation and verification (V&V) of uncertain nonlinear systems. Existing systems-control literature on model and controller V&V either deal with linear systems with norm-bounded uncertainties,or consider nonlinear systems in set-based and moment based framework. These existing methods deal with model invalidation or falsification, rather than assessing the quality of a model with respect to measured data. In this dissertation, an axiomatic framework for model validation is proposed in probabilistically relaxed sense, that instead of simply invalidating a model, seeks to quantify the "degree of validation". To develop this framework, novel algorithms for uncertainty propagation have been proposed for both deterministic and stochastic nonlinear systems in continuous time. For the deterministic flow, we compute the time-varying joint probability density functions over the state space, by solving the Liouville equation via method-of-characteristics. For the stochastic flow, we propose an approximation algorithm that combines the method-of-characteristics solution of Liouville equation with the Karhunen-Lo eve expansion of process noise, thus enabling an indirect solution of Fokker-Planck equation, governing the evolution of joint probability density functions. The efficacy of these algorithms are demonstrated for risk assessment in Mars entry-descent-landing, and for nonlinear estimation. Next, the V&V problem is formulated in terms of Monge-Kantorovich optimal transport, naturally giving rise to a metric, called Wasserstein metric, on the space of probability densities. It is shown that the resulting computation leads to solving a linear program at each time of measurement availability, and computational complexity results for the same are derived. Probabilistic guarantees in average and worst case sense, are given for the validation oracle resulting from the proposed method. The framework is demonstrated for nonlinear robustness veri cation of F-16 flight controllers, subject to probabilistic uncertainties. Frequency domain interpretations for the proposed framework are derived for linear systems, and its connections with existing nonlinear model validation methods are pointed out. In particular, we show that the asymptotic Wasserstein gap between two single-output linear time invariant systems excited by Gaussian white noise, is the difference between their average gains, up to a scaling by the strength of the input noise. A geometric interpretation of this result allows us to propose an intrinsic normalization of the Wasserstein gap, which in turn allows us to compare it with classical systems-theoretic metrics like v-gap. Next, it is shown that the optimal transport map can be used to automatically refine the model. This model refinement formulation leads to solving a non-smooth convex optimization problem. Examples are given to demonstrate how proximal operator splitting based computation enables numerically solving the same. This method is applied for nite-time feedback control of probability density functions, and for data driven modeling of dynamical systems

    Learning and inference with Wasserstein metrics

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 131-143).This thesis develops new approaches for three problems in machine learning, using tools from the study of optimal transport (or Wasserstein) distances between probability distributions. Optimal transport distances capture an intuitive notion of similarity between distributions, by incorporating the underlying geometry of the domain of the distributions. Despite their intuitive appeal, optimal transport distances are often difficult to apply in practice, as computing them requires solving a costly optimization problem. In each setting studied here, we describe a numerical method that overcomes this computational bottleneck and enables scaling to real data. In the first part, we consider the problem of multi-output learning in the presence of a metric on the output domain. We develop a loss function that measures the Wasserstein distance between the prediction and ground truth, and describe an efficient learning algorithm based on entropic regularization of the optimal transport problem. We additionally propose a novel extension of the Wasserstein distance from probability measures to unnormalized measures, which is applicable in settings where the ground truth is not naturally expressed as a probability distribution. We show statistical learning bounds for both the Wasserstein loss and its unnormalized counterpart. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data image tagging problem, outperforming a baseline that doesn't use the metric. In the second part, we consider the probabilistic inference problem for diffusion processes. Such processes model a variety of stochastic phenomena and appear often in continuous-time state space models. Exact inference for diffusion processes is generally intractable. In this work, we describe a novel approximate inference method, which is based on a characterization of the diffusion as following a gradient flow in a space of probability densities endowed with a Wasserstein metric. Existing methods for computing this Wasserstein gradient flow rely on discretizing the underlying domain of the diffusion, prohibiting their application to problems in more than several dimensions. In the current work, we propose a novel algorithm for computing a Wasserstein gradient flow that operates directly in a space of continuous functions, free of any underlying mesh. We apply our approximate gradient flow to the problem of filtering a diffusion, showing superior performance where standard filters struggle. Finally, we study the ecological inference problem, which is that of reasoning from aggregate measurements of a population to inferences about the individual behaviors of its members. This problem arises often when dealing with data from economics and political sciences, such as when attempting to infer the demographic breakdown of votes for each political party, given only the aggregate demographic and vote counts separately. Ecological inference is generally ill-posed, and requires prior information to distinguish a unique solution. We propose a novel, general framework for ecological inference that allows for a variety of priors and enables efficient computation of the most probable solution. Unlike previous methods, which rely on Monte Carlo estimates of the posterior, our inference procedure uses an efficient fixed point iteration that is linearly convergent. Given suitable prior information, our method can achieve more accurate inferences than existing methods. We additionally explore a sampling algorithm for estimating credible regions.by Charles Frogner.Ph. D

    Proving Linear Mode Connectivity of Neural Networks via Optimal Transport

    Full text link
    The energy landscape of high-dimensional non-convex optimization problems is crucial to understanding the effectiveness of modern deep neural network architectures. Recent works have experimentally shown that two different solutions found after two runs of a stochastic training are often connected by very simple continuous paths (e.g., linear) modulo a permutation of the weights. In this paper, we provide a framework theoretically explaining this empirical observation. Based on convergence rates in Wasserstein distance of empirical measures, we show that, with high probability, two wide enough two-layer neural networks trained with stochastic gradient descent are linearly connected. Additionally, we express upper and lower bounds on the width of each layer of two deep neural networks with independent neuron weights to be linearly connected. Finally, we empirically demonstrate the validity of our approach by showing how the dimension of the support of the weight distribution of neurons, which dictates Wasserstein convergence rates is correlated with linear mode connectivity

    Non-linear dependences in finance

    Full text link
    The thesis is composed of three parts. Part I introduces the mathematical and statistical tools that are relevant for the study of dependences, as well as statistical tests of Goodness-of-fit for empirical probability distributions. I propose two extensions of usual tests when dependence is present in the sample data and when observations have a fat-tailed distribution. The financial content of the thesis starts in Part II. I present there my studies regarding the "cross-sectional" dependences among the time series of daily stock returns, i.e. the instantaneous forces that link several stocks together and make them behave somewhat collectively rather than purely independently. A calibration of a new factor model is presented here, together with a comparison to measurements on real data. Finally, Part III investigates the temporal dependences of single time series, using the same tools and measures of correlation. I propose two contributions to the study of the origin and description of "volatility clustering": one is a generalization of the ARCH-like feedback construction where the returns are self-exciting, and the other one is a more original description of self-dependences in terms of copulas. The latter can be formulated model-free and is not specific to financial time series. In fact, I also show here how concepts like recurrences, records, aftershocks and waiting times, that characterize the dynamics in a time series can be written in the unifying framework of the copula.Comment: PhD Thesi

    An Invitation to Statistics in Wasserstein Space

    Get PDF
    This open access book presents the key aspects of statistics in Wasserstein spaces, i.e. statistics in the space of probability measures when endowed with the geometry of optimal transportation. Further to reviewing state-of-the-art aspects, it also provides an accessible introduction to the fundamentals of this current topic, as well as an overview that will serve as an invitation and catalyst for further research. Statistics in Wasserstein spaces represents an emerging topic in mathematical statistics, situated at the interface between functional data analysis (where the data are functions, thus lying in infinite dimensional Hilbert space) and non-Euclidean statistics (where the data satisfy nonlinear constraints, thus lying on non-Euclidean manifolds). The Wasserstein space provides the natural mathematical formalism to describe data collections that are best modeled as random measures on Euclidean space (e.g. images and point processes). Such random measures carry the infinite dimensional traits of functional data, but are intrinsically nonlinear due to positivity and integrability restrictions. Indeed, their dominating statistical variation arises through random deformations of an underlying template, a theme that is pursued in depth in this monograph. ; Gives a succinct introduction to necessary mathematical background, focusing on the results useful for statistics from an otherwise vast mathematical literature. Presents an up to date overview of the state of the art, including some original results, and discusses open problems. Suitable for self-study or to be used as a graduate level course text. Open access

    Fréchet means in Wasserstein space:theory and algorithms

    Get PDF
    This work studies the problem of statistical inference for Fréchet means in the Wasserstein space of measures on Euclidean spaces, W2(Rd)\mathcal W_2 ( \mathbb R^d ). This question arises naturally from the problem of separating amplitude and phase variation in point processes, analogous to a well-known problem in functional data analysis. We formulate the point process version of the problem, show that it is canonically equivalent to that of estimating Fréchet means in W2(Rd)\mathcal W_2 ( \mathbb R d ), and carry out estimation by means of MM-estimation. This approach allows to achieve consistency in a genuinely nonparametric framework, even in a sparse sampling regime. For Cox processes on the real line, consistency is supplemented by convergence rates and, in the dense sampling regime, n\sqrt n-consistency and a central limit theorem. Computation of the Fréchet mean is challenging when the processes are multivariate, in which case our Fréchet mean estimator is only defined implicitly as the minimiser of an optimisation problem. To overcome this difficulty, we propose a steepest descent algorithm that approximates the minimiser, and show that it converges to a local minimum. Our techniques are specific to the Wasserstein space, because Hessian-type arguments that are commonly used for similar convergence proofs do not apply to that space. In addition, we discuss similarities with generalised Procrustes analysis. The key advantage of the algorithm is that it requires only the solution of pairwise transportation problems. The results in the preceding paragraphs require properties of Fréchet means in W2(Rd)\mathcal W_2 ( \mathbb R ^d ) whose theory is developed, supplemented by some new results. We present the tangent bundle and exploit its relation to optimal maps in order to derive differentiability properties of the associated Fréchet functional, obtaining a characterisation of Karcher means. Additionally, we establish a new optimality criterion for local minima and prove a new stability result for the optimal maps that, enhanced with the established consistency of the Fréchet mean estimator, yields consistency of the optimal transportation maps

    Lower Complexity Adaptation for Empirical Entropic Optimal Transport

    Full text link
    Entropic optimal transport (EOT) presents an effective and computationally viable alternative to unregularized optimal transport (OT), offering diverse applications for large-scale data analysis. In this work, we derive novel statistical bounds for empirical plug-in estimators of the EOT cost and show that their statistical performance in the entropy regularization parameter ϵ\epsilon and the sample size nn only depends on the simpler of the two probability measures. For instance, under sufficiently smooth costs this yields the parametric rate n1/2n^{-1/2} with factor ϵd/2\epsilon^{-d/2}, where dd is the minimum dimension of the two population measures. This confirms that empirical EOT also adheres to the lower complexity adaptation principle, a hallmark feature only recently identified for unregularized OT. As a consequence of our theory, we show that the empirical entropic Gromov-Wasserstein distance and its unregularized version for measures on Euclidean spaces also obey this principle. Additionally, we comment on computational aspects and complement our findings with Monte Carlo simulations. Our techniques employ empirical process theory and rely on a dual formulation of EOT over a single function class. Crucial to our analysis is the observation that the entropic cost-transformation of a function class does not increase its uniform metric entropy by much.Comment: 46 pages, 5 figure
    corecore