Search CORE

9 research outputs found

Distributionally Robust Optimization and Robust Statistics

Author: Blanchet Jose
Li Jiajin
Lin Sirui
Zhang Xuhui
Publication venue
Publication date: 26/01/2024
Field of study

We review distributionally robust optimization (DRO), a principled approach for constructing statistical estimators that hedge against the impact of deviations in the expected loss between the training and deployment environments. Many well-known estimators in statistics and machine learning (e.g. AdaBoost, LASSO, ridge regression, dropout training, etc.) are distributionally robust in a precise sense. We hope that by discussing the DRO interpretation of well-known estimators, statisticians who may not be too familiar with DRO may find a way to access the DRO literature through the bridge between classical results and their DRO equivalent formulation. On the other hand, the topic of robustness in statistics has a rich tradition associated with removing the impact of contamination. Thus, another objective of this paper is to clarify the difference between DRO and classical statistical robustness. As we will see, these are two fundamentally different philosophies leading to completely different types of estimators. In DRO, the statistician hedges against an environment shift that occurs after the decision is made; thus DRO estimators tend to be pessimistic in an adversarial setting, leading to a min-max type formulation. In classical robust statistics, the statistician seeks to correct contamination that occurred before a decision is made; thus robust statistical estimators tend to be optimistic leading to a min-min type formulation

arXiv.org e-Print Archive

Statistical Estimation Under Distribution Shift: Wasserstein Perturbations and Minimax Theory

Author: Chao Patrick
Dobriban Edgar
Publication venue
Publication date: 09/10/2023
Field of study

Distribution shifts are a serious concern in modern statistical learning as they can systematically change the properties of the data away from the truth. We focus on Wasserstein distribution shifts, where every data point may undergo a slight perturbation, as opposed to the Huber contamination model where a fraction of observations are outliers. We consider perturbations that are either independent or coordinated joint shifts across data points. We analyze several important statistical problems, including location estimation, linear regression, and non-parametric density estimation. Under a squared loss for mean estimation and prediction error in linear regression, we find the exact minimax risk, a least favorable perturbation, and show that the sample mean and least squares estimators are respectively optimal. For other problems, we provide nearly optimal estimators and precise finite-sample bounds. We also introduce several tools for bounding the minimax risk under general distribution shifts, not just for Wasserstein perturbations, such as a smoothing technique for location families, and generalizations of classical tools including least favorable sequences of priors, the modulus of continuity, as well as Le Cam's, Fano's, and Assouad's methods.Comment: 60 pages, 7 figure

arXiv.org e-Print Archive

Distributionally Robust Optimization: A Review

Author: Mehrotra Sanjay
Rahimian Hamed
Publication venue
Publication date: 12/08/2019
Field of study

The concepts of risk-aversion, chance-constrained optimization, and robust optimization have developed significantly over the last decade. Statistical learning community has also witnessed a rapid theoretical and applied growth by relying on these concepts. A modeling framework, called distributionally robust optimization (DRO), has recently received significant attention in both the operations research and statistical learning communities. This paper surveys main concepts and contributions to DRO, and its relationships with robust optimization, risk-aversion, chance-constrained optimization, and function regularization

arXiv.org e-Print Archive

Recommended from our members

Distributionally Robust Optimization and its Applications in Machine Learning

Author: Kang Yang
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

The goal of Distributionally Robust Optimization (DRO) is to minimize the cost of running a stochastic system, under the assumption that an adversary can replace the underlying baseline stochastic model by another model within a family known as the distributional uncertainty region. This dissertation focuses on a class of DRO problems which are data-driven, which generally speaking means that the baseline stochastic model corresponds to the empirical distribution of a given sample. One of the main contributions of this dissertation is to show that the class of data-driven DRO problems that we study unify many successful machine learning algorithms, including square root Lasso, support vector machines, and generalized logistic regression, among others. A key distinctive feature of the class of DRO problems that we consider here is that our distributional uncertainty region is based on optimal transport costs. In contrast, most of the DRO formulations that exist to date take advantage of a likelihood based formulation (such as Kullback-Leibler divergence, among others). Optimal transport costs include as a special case the so-called Wasserstein distance, which is popular in various statistical applications. The use of optimal transport costs is advantageous relative to the use of divergence-based formulations because the region of distributional uncertainty contains distributions which explore samples outside of the support of the empirical measure, therefore explaining why many machine learning algorithms have the ability to improve generalization. Moreover, the DRO representations that we use to unify the previously mentioned machine learning algorithms, provide a clear interpretation of the so-called regularization parameter, which is known to play a crucial role in controlling generalization error. As we establish, the regularization parameter corresponds exactly to the size of the distributional uncertainty region. Another contribution of this dissertation is the development of statistical methodology to study data-driven DRO formulations based on optimal transport costs. Using this theory, for example, we provide a sharp characterization of the optimal selection of regularization parameters in machine learning settings such as square-root Lasso and regularized logistic regression. Our statistical methodology relies on the construction of a key object which we call the robust Wasserstein profile function (RWP function). The RWP function similar in spirit to the empirical likelihood profile function in the context of empirical likelihood (EL). But the asymptotic analysis of the RWP function is different because of a certain lack of smoothness which arises in a suitable Lagrangian formulation. Optimal transport costs have many advantages in terms of statistical modeling. For example, we show how to define a class of novel semi-supervised learning estimators which are natural companions of the standard supervised counterparts (such as square root Lasso, support vector machines, and logistic regression). We also show how to define the distributional uncertainty region in a purely data-driven way. Precisely, the optimal transport formulation allows us to inform the shape of the distributional uncertainty, not only its center (which given by the empirical distribution). This shape is informed by establishing connections to the metric learning literature. We develop a class of metric learning algorithms which are based on robust optimization. We use the robust-optimization-based metric learning algorithms to inform the distributional uncertainty region in our data-driven DRO problem. This means that we endow the adversary with additional which force him to spend effort on regions of importance to further improve generalization properties of machine learning algorithms. In summary, we explain how the use of optimal transport costs allow constructing what we call double-robust statistical procedures. We test all of the procedures proposed in this paper in various data sets, showing significant improvement in generalization ability over a wide range of state-of-the-art procedures. Finally, we also discuss a class of stochastic optimization algorithms of independent interest which are particularly useful to solve DRO problems, especially those which arise when the distributional uncertainty region is based on optimal transport costs

Columbia University Academic Commons

Frameworks and Results in Distributionally Robust Optimization

Author: Hamed Rahimian
Sanjay Mehrotra
Publication venue: 'Cellule MathDoc/CEDRAM'
Publication date: 01/01/2022
Field of study

Numérisation de Documents Anciens Mathématiques

Open Journal of Mathematical Optimization