27 research outputs found

    Minimax estimation of smooth optimal transport maps

    Full text link
    Brenier's theorem is a cornerstone of optimal transport that guarantees the existence of an optimal transport map TT between two probability distributions PP and QQ over Rd\mathbb{R}^d under certain regularity conditions. The main goal of this work is to establish the minimax estimation rates for such a transport map from data sampled from PP and QQ under additional smoothness assumptions on TT. To achieve this goal, we develop an estimator based on the minimization of an empirical version of the semi-dual optimal transport problem, restricted to truncated wavelet expansions. This estimator is shown to achieve near minimax optimality using new stability arguments for the semi-dual and a complementary minimax lower bound. Furthermore, we provide numerical experiments on synthetic data supporting our theoretical findings and highlighting the practical benefits of smoothness regularization. These are the first minimax estimation rates for transport maps in general dimension.Comment: 53 pages, 6 figure

    The Hidden Geometry of Particle Collisions

    Full text link
    We establish that many fundamental concepts and techniques in quantum field theory and collider physics can be naturally understood and unified through a simple new geometric language. The idea is to equip the space of collider events with a metric, from which other geometric objects can be rigorously defined. Our analysis is based on the energy mover's distance, which quantifies the "work" required to rearrange one event into another. This metric, which operates purely at the level of observable energy flow information, allows for a clarified definition of infrared and collinear safety and related concepts. A number of well-known collider observables can be exactly cast as the minimum distance between an event and various manifolds in this space. Jet definitions, such as exclusive cone and sequential recombination algorithms, can be directly derived by finding the closest few-particle approximation to the event. Several area- and constituent-based pileup mitigation strategies are naturally expressed in this formalism as well. Finally, we lift our reasoning to develop a precise distance between theories, which are treated as collections of events weighted by cross sections. In all of these various cases, a better understanding of existing methods in our geometric language suggests interesting new ideas and generalizations.Comment: 56 pages, 11 figures, 5 tables; v2: minor changes and updated references; v3: updated to match JHEP versio

    Efficient posterior sampling for high-dimensional imbalanced logistic regression

    Full text link
    High-dimensional data are routinely collected in many areas. We are particularly interested in Bayesian classification models in which one or more variables are imbalanced. Current Markov chain Monte Carlo algorithms for posterior computation are inefficient as nn and/or pp increase due to worsening time per step and mixing rates. One strategy is to use a gradient-based sampler to improve mixing while using data sub-samples to reduce per-step computational complexity. However, usual sub-sampling breaks down when applied to imbalanced data. Instead, we generalize piece-wise deterministic Markov chain Monte Carlo algorithms to include importance-weighted and mini-batch sub-sampling. These approaches maintain the correct stationary distribution with arbitrarily small sub-samples, and substantially outperform current competitors. We provide theoretical support and illustrate gains in simulated and real data applications.Comment: 4 figure

    Parallel Algorithms for Geometric Graph Problems

    Full text link
    We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a (1+ϵ)(1+\epsilon)-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem, despite drawing significant attention in recent years. We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, n1+oϵ(1)n^{1+o_\epsilon(1)}. We note that while recently Sharathkumar and Agarwal developed a near-linear time algorithm for (1+ϵ)(1+\epsilon)-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in their work. Furthermore, our algorithm immediately gives a (1+ϵ)(1+\epsilon)-approximation algorithm with nδn^{\delta} space in the streaming-with-sorting model with 1/δO(1)1/\delta^{O(1)} passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem

    Scalable Bayesian Inference Using Stochastic Gradient Markov Chain Monte Carlo

    Get PDF
    Bayesian inference offers a flexible framework to account for uncertainty across all unobserved quantities in a model. Markov chain Monte Carlo (MCMC) is a class of sampling algorithms which simulate from the Bayesian posterior distribution. These methods are generally regarded as the go-to computational technique for practical Bayesian modelling. MCMC is well-understood, offers (asymptotically) exact inference, and can be implemented intuitively. Samplers built upon the Metropolis-Hastings algorithm can benefit from strong theoretical guarantees under reasonable conditions. Derived from discrete-time approximations of ItĂ´ diffusions, gradient-based samplers (Roberts and Rosenthal, 1998; Neal, 2011) leverage local gradient information in their proposal, allowing for efficient exploration of the posterior. The most championed of the diffusion processes are the overdamped Langevin diffusion and Hamiltonian dynamics. In large data settings, standard MCMC can falter. The per-iteration cost of calculating the loglikelihood in the Metropolis-Hastings acceptance step scales with dataset size. Gradient-based samplers are doubly afflicted in this scenario, given that a full-data gradient is computed each iteration. These issues have prompted considerable interest in developing approaches for scalable Bayesian inference. This thesis proposes novel contributions for stochastic gradient MCMC (Welling and Teh, 2011; Ma et al., 2015; Nemeth and Fearnhead, 2021). Stochastic gradient MCMC utilises data subsampling to construct a noisy, unbiased estimate of the gradient of the log-posterior. The first two chapters review key background from the literature. Chapter 3 presents our first paper contribution. In this work, we extend stochastic gradient MCMC to time series, via non-linear, non-Gaussian state space models. Chapter 4 presents the second paper contribution of this thesis. Here, we examine the use of a preferential subsampling distribution to reweight the stochastic gradient and improve variance control. Chapter 5 evaluates the feasibility of using determinantal point processes (Kulesza et al., 2012) for data subsampling in SGLD. We conclude and propose directions for future work in Chapter 6

    Génération de scénarios par quantification optimale en dimension élevée

    Get PDF
    RÉSUMÉ : L'optimisation stochastique est une branche de la recherche opérationnelle qui traite de problèmes d'optimisation impliquant des phénomènes aléatoires. Le programme mathématique, défini par une fonction objectif et des contraintes, contient alors une ou plusieurs variables aléatoires et est appelé programme stochastique. Avant de procéder à sa résolution, on doit d'abord modéliser le vecteur aléatoire apparaissant dans le problème. Or, il n'est généralement pas possible de résoudre un programme stochastique pour lequel le support des variables aléatoires est infini, tel que les distributions continues. On doit donc discrétiser la loi probabiliste sur un ensemble fini de réalisations, appelées scénarios, auxquelles on assigne une probabilité. Leur quantité est choisie en fonction du temps de convergence numérique et de la finesse de l'approximation désirés. L'objectif principal de ce mémoire est de développer des méthodes de génération de scénarios permettant d'obtenir des solutions près de l'optimalité pour les programmes stochastiques comprenant des vecteurs aléatoires en haute dimension. Il existe plusieurs méthodes de génération de scénarios, dont la plus courante est l'échantillonnage pur et consiste à tirer les scénarios au hasard à partir de la distribution estimée du vecteur aléatoire. Cependant, les scénarios obtenus par échantillonnage pur sont rarement ceux qui représentent le mieux la loi de probabilités. Dans ce mémoire, nous justifions à l'aide de résultats théoriques et expérimentaux que les scénarios devraient plutôt être générés par la méthode de quantification optimale. Nous montrons ensuite que lorsque le nombre de donnes tend vers l'infini, les problèmes de k-médianes et k-moyennes sont équivalents à la quantification optimale avec les normes L1 et L2 respectivement. Les techniques développées pour générer les scénarios sont donc inspirées d'algorithmes de partitionnement de données. Il n'est pas toujours possible d'estimer avec confiance la distribution d'un vecteur aléatoire à partir d'un ensemble de données. Le cas où la distribution est connue (ou estimable) est donc traité séparément de celui où elle ne l'est pas. Lorsque la distribution est connue et que le problème ne contient qu'une seule variable aléatoire, nous utilisons l'algorithme de Lloyd qui nous permet d'atteindre le minimum global des problèmes de k-moyennes et k-médianes continus. Dans le cas multidimensionnel, nous choisirons plutôt la méthode de quantification vectorielle par apprentissage compétitif (QVAC). La quantification optimale suggéré l'utilisation de la distance induite par la norme L1, puisqu'elle permet d'établir une borne supérieure sur l'erreur de discrétisation du programme stochastique. Afin de quantifier le vecteur aléatoire avec la norme L1, nous adaptons le paramètre de saut de la QVAC, qui est généralement utilisé avec la distance euclidienne. Nous trouvons cependant que la borne supérieure sur l'erreur de discrétisation peut être beaucoup plus grande que l'erreur elle-même. On en déduit que la norme L2 peut également être utilisée pour générer les scénarios et offre une plus grande couverture des événements extrêmes. Lorsque la distribution n'est pas connue, nous utilisons les algorithmes d'échange des centres et de Lloyd (k-médianes et k-moyennes) qui permettent de générer les scénarios directement à partir des données. Dans le dernier chapitre, on analyse entre autres les effets du nombre de scénarios, de la norme utilisée, de la variance et de la dimension du vecteur aléatoire sur nos méthodes de génération de scénarios. On observe sans surprises qu'il est particulièrement difficile d'obtenir des solutions de qualité lorsque la dimension est élevée. Trois méthodes sont donc proposées pour réduire la dimension effective du problème, dont l'analyse par composantes principales et les copules. Parmi celles-ci, on constate cependant que seule l'analyse par composantes principales permet de réduire les coûts de l'optimisation stochastique en dimension élevée. Les scénarios sont testés sur le problème du vendeur de journaux, où la demande suit une loi log-normale ainsi que sur une application réelle à partir d'un ensemble de données historiques lié à la confection d'horaire de personnels. Les solutions de l'optimisation stochastique à partir des méthodes de génération de scénarios proposées ont été comparées à celles obtenues par échantillonnage pur et par l'optimisation déterministe. Pour le problème du vendeur de journaux avec distribution de probabilités connue, des gains substantiels de nos méthodes sont observés pour la quasi-totalité des instances étudiées. Lorsque la distribution n'est pas connue, nos méthodes induisent des erreurs de discrétisation moins de 340 fois plus petites que celles de l'optimisation déterministe avec 100 scénarios. Les erreurs obtenues par les algorithmes d'échange des centres et de Lloyd sont similaires, mais ce dernier reste généralement plus pratique à cause de sa simplicité et sa rapidité d'exécution. Dans le cas du problème de confection d'horaires, nous utilisons toutefois l'algorithme d'échange des centres puisqu'il permet de générer des scénarios de la demande en nombres entiers. Malgré la dimension élevée du problème et les faibles variances, nos méthodes de génération de scénarios permettent tout de même d'obtenir des gains modestes par rapport à l'optimisation déterministe.----------ABSTRACT : Stochastic optimization is a branch of operations research that deals with optimization problems involving random processes. The mathematical program dened by the objective function and the constraints then contains one or several random variables and is called stochastic program. Prior to its resolution, we must rst model the random vector appearing in the problem. However, it is generally not possible to solve a stochastic program for which the support of the random variables is innite, such as continuous distributions. Therefore, we must discretize the probabilistic law over a nite set of events, called scenarios, to which we assign probabilities. Their quantity is chosen according to the desired convergence time and discretization precision. The main objective of this paper is to develop scenario generation methods that lead to near optimal solutions of the stochastic program containing high dimensional random vectors. There are several methods for generating scenarios, amongst which the most common is pure sampling and consists in randomly selecting scenarios from the estimated distribution. However, the scenarios obtained by pure sampling are rarely those which represent the law of probability best. In this paper, we justify using experimental and theoretical results that the scenarios should rather be generated dy optimal quantization. We then show that when the data set is innite, the clustering analysis k-means and k-medians problems are equivalent to optimal quantization with L1 and L2 norm respectively. As a result, the techniques developed for generating scenarios are inspired by clustering analysis algorithms. It is not always possible to condently estimate the distribution of a random vector from data. The cases where the probability distribution is known (or can be estimated) and unknown are thus treated separately. In the event where the probability law is known and the problem includes a single random variable, we use Lloyd's algorithm, which converges to the global minimum of the continuous k-medians and k-means problems. In the multivariate cases, we will rather chose the competitive learning vector quantization (CLVQ) method. Optimal quantization suggests the use of the L1-norm, since it allows us to establish an upper bound on the stochastic program discretization error. In order to quantify the random vector with the L1-norm, we adapt the CLVQ step parameter, which is ordinarily used with euclidean distance. However, we nd that the upper bound on the discretization error may be much larger than the error itself. Hence, we deduce that the L2-norm may also be used to generate scenarios and provides greater coverage of extreme events. When the distribution is unknown, we use the swapping centers and Lloyd (k-medians and k-means) algorithms that allow direct scenario generation from the data. In the last chapter, we analyze the eects of the number of scenarios, norm, variance and dimension of the random vector on our scenario generation methods. As expected, we observe that it is particularly dicult to obtain quality solutions when the dimension is high. Three methods are then proposed to reduce the eective dimensionality of the problem, including principal components analysis and copulas. Amongst these, it is noted that only principal components analysis reduces the costs of high-dimensional stochastic optimization. Our scenarios are tested on a virtual news vendor problem, where demand follows a log-normal distribution, and on a real data set for employee scheduling. The stochastic optimization solutions obtained by our methods are compared to those of pure sampling and deterministic optimization. For the news vendor problem with known probability distribution, substantial gains of our methods are observed for almost all instances studied. When the distribution is unknown, our methods induce costs less than 340 times that of deterministic optimization with 100 scenarios. The discretization errors obtained by the swapping center algorithm and Lloyd's method are similar, but the latter is most appealing due to its simplicity and execution speed. Nonetheless, for the employee scheduling problem, we still use the swapping algorithm since it allows us to generate integer scenarios of demand. Despite the high dimensionality of the problem and low variances, our scenario generation methods allow modest prots compared to deterministic optimization
    corecore