Search CORE

32 research outputs found

High-dimensional, robust, heteroscedastic variable selection with the adaptive LASSO, and applications to random coefficient regression

Author: Hermann Philipp
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2021
Field of study

In this thesis, theoretical results for the adaptive LASSO in high-dimensional, sparse linear regression models with potentially heavy-tailed and heteroscedastic errors are developed. In doing so, the empirical pseudo Huber loss is considered as loss function and the main focus is sign-consistency of the resulting estimator. Simulations illustrate the favorable numerical performance of the proposed methodology in comparison to the ordinary adaptive LASSO. Subsequently, those results are applied to the linear random coefficient regression model, more precisely to the means, variances and covariances of the coefficients. Furthermore, sufficient conditions for the identifiability of the first and second moments, as well as asymptotic results for a fixed number of coefficients are given

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg

Robust Orthogonal Complement Principal Component Analysis

Author: Li Shijie
She Yiyuan
Wu Dapeng
Publication venue
Publication date: 27/01/2016
Field of study

Recently, the robustification of principal component analysis has attracted lots of attention from statisticians, engineers and computer scientists. In this work we study the type of outliers that are not necessarily apparent in the original observation space but can seriously affect the principal subspace estimation. Based on a mathematical formulation of such transformed outliers, a novel robust orthogonal complement principal component analysis (ROC-PCA) is proposed. The framework combines the popular sparsity-enforcing and low rank regularization techniques to deal with row-wise outliers as well as element-wise outliers. A non-asymptotic oracle inequality guarantees the accuracy and high breakdown performance of ROC-PCA in finite samples. To tackle the computational challenges, an efficient algorithm is developed on the basis of Stiefel manifold optimization and iterative thresholding. Furthermore, a batch variant is proposed to significantly reduce the cost in ultra high dimensions. The paper also points out a pitfall of a common practice of SVD reduction in robust PCA. Experiments show the effectiveness and efficiency of ROC-PCA in both synthetic and real data

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Distributionally Robust Performance Analysis with Applications to Mine Valuation and Risk

Author: Dolan Christopher James
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

We consider several problems motivated by issues faced in the mining industry. In recent years, it has become clear that mines have substantial tail risk in the form of environmental disasters, and this tail risk is not incorporated into common pricing and risk models. However, data sets of the extremal climate behavior that drive this risk are very small, and generally inadequate for properly estimating the tail behavior. We propose a data-driven methodology that comes up with reasonable worst-case scenarios, given the data size constraints, and we incorporate this into a real options based model for the valuation of mines. We propose several different iterations of the model, to allow the end-user to choose the degree to which they wish to specify the financial consequences of the disaster scenario. Next, in order to perform a risk analysis on a portfolio of mines, we propose a method of estimating the correlation structure of high-dimensional max-stable processes. Using the techniques of (Liu Et al, 2017) to map the relationship between normal correlations and max-stable correlations, we can then use techniques inspired by (Bickel et al, 2008, Liu et al, 2014, Rothman et al, 2009) to estimate the underlying correlation matrix, while preserving a sparse, positive-definite structure. The correlation matrices are then used in the calculation of model-robust risk metrics (VaR, CVAR) using the the Sample-Out-of-Sample methodology (Blanchet and Kang, 2017). We conclude with several new techniques that were developed in the field of robust performance analysis, that while not directly applied to mining, were motivated by our studies into distributionally robust optimization in order to address these problems

Columbia University Academic Commons

Ambulance Emergency Response Optimization in Developing Countries

Author: Boutilier Justin J.
Chan Timothy C. Y.
Publication venue
Publication date: 31/07/2019
Field of study

The lack of emergency medical transportation is viewed as the main barrier to the access of emergency medical care in low and middle-income countries (LMICs). In this paper, we present a robust optimization approach to optimize both the location and routing of emergency response vehicles, accounting for uncertainty in travel times and spatial demand characteristic of LMICs. We traveled to Dhaka, Bangladesh, the sixth largest and third most densely populated city in the world, to conduct field research resulting in the collection of two unique datasets that inform our approach. This data is leveraged to develop machine learning methodologies to estimate demand for emergency medical services in a LMIC setting and to predict the travel time between any two locations in the road network for different times of day and days of the week. We combine our robust optimization and machine learning frameworks with real data to provide an in-depth investigation into three policy-related questions. First, we demonstrate that outpost locations optimized for weekday rush hour lead to good performance for all times of day and days of the week. Second, we find that significant improvements in emergency response times can be achieved by re-locating a small number of outposts and that the performance of the current system could be replicated using only 30% of the resources. Lastly, we show that a fleet of small motorcycle-based ambulances has the potential to significantly outperform traditional ambulance vans. In particular, they are able to capture three times more demand while reducing the median response time by 42% due to increased routing flexibility offered by nimble vehicles on a larger road network. Our results provide practical insights for emergency response optimization that can be leveraged by hospital-based and private ambulance providers in Dhaka and other urban centers in LMICs

arXiv.org e-Print Archive

University of Toronto Research Repository

Recommended from our members

Distributionally Robust Performance Analysis: Data, Dependence and Extremes

Author: He Fei
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

This dissertation focuses on distributionally robust performance analysis, which is an area of applied probability whose aim is to quantify the impact of model errors. Stochastic models are built to describe phenomena of interest with the intent of gaining insights or making informed decisions. Typically, however, the fidelity of these models (i.e. how closely they describe the underlying reality) may be compromised due to either the lack of information available or tractability considerations. The goal of distributionally robust performance analysis is then to quantify, and potentially mitigate, the impact of errors or model misspecifications. As such, distributionally robust performance analysis affects virtually any area in which stochastic modelling is used for analysis or decision making. This dissertation studies various aspects of distributionally robust performance analysis. For example, we are concerned with quantifying the impact of model error in tail estimation using extreme value theory. We are also concerned with the impact of the dependence structure in risk analysis when marginal distributions of risk factors are known. In addition, we also are interested in connections recently found to machine learning and other statistical estimators which are based on distributionally robust optimization. The first problem that we consider consists in studying the impact of model specification in the context of extreme quantiles and tail probabilities. There is a rich statistical theory that allows to extrapolate tail behavior based on limited information. This body of theory is known as extreme value theory and it has been successfully applied to a wide range of settings, including building physical infrastructure to withstand extreme environmental events and also guiding the capital requirements of insurance companies to ensure their financial solvency. Not surprisingly, attempting to extrapolate out into the tail of a distribution from limited observations requires imposing assumptions which are impossible to verify. The assumptions imposed in extreme value theory imply that a parametric family of models (known as generalized extreme value distributions) can be used to perform tail estimation. Because such assumptions are so difficult (or impossible) to be verified, we use distributionally robust optimization to enhance extreme value statistical analysis. Our approach results in a procedure which can be easily applied in conjunction with standard extreme value analysis and we show that our estimators enjoy correct coverage even in settings in which the assumptions imposed by extreme value theory fail to hold. In addition to extreme value estimation, which is associated to risk analysis via extreme events, another feature which often plays a role in the risk analysis is the impact of dependence structure among risk factors. In the second chapter we study the question of evaluating the worst-case expected cost involving two sources of uncertainty, each of them with a specific marginal probability distribution. The worst-case expectation is optimized over all joint probability distributions which are consistent with the marginal distributions specified for each source of uncertainty. So, our formulation allows to capture the impact of the dependence structure of the risk factors. This formulation is equivalent to the so-called Monge-Kantorovich problem studied in optimal transport theory, whose theoretical properties have been studied in the literature substantially. However, rates of convergence of computational algorithms for this problem have been studied only recently. We show that if one of the random variables takes finitely many values, a direct Monte Carlo approach allows to evaluate such worst case expectation with

O(n^{-1/2})

convergence rate as the number of Monte Carlo samples,

n

, increases to infinity. Next, we continue our investigation of worst-case expectations in the context of multiple risk factors, not only two of them, assuming that their marginal probability distributions are fixed. This problem does not fit the mold of standard optimal transport (or Monge-Kantorovich) problems. We consider, however, cost functions which are separable in the sense of being a sum of functions which depend on adjacent pairs of risk factors (think of the factors indexed by time). In this setting, we are able to reduce the problem to the study of several separate Monge-Kantorovich problems. Moreover, we explain how we can even include martingale constraints which are often natural to consider in settings such as financial applications. While in the previous chapters we focused on the impact of tail modeling or dependence, in the later parts of the dissertation we take a broader view by studying decisions which are made based on empirical observations. So, we focus on so-called distributionally robust optimization formulations. We use optimal transport theory to model the degree of distributional uncertainty or model misspecification. Distributionally robust optimization based on optimal transport has been a very active research topic in recent years, our contribution consists in studying how to specify the optimal transport metric in a data-driven way. We explain our procedure in the context of classification, which is of substantial importance in machine learning applications

Columbia University Academic Commons