32 research outputs found
High-dimensional, robust, heteroscedastic variable selection with the adaptive LASSO, and applications to random coefficient regression
In this thesis, theoretical results for the adaptive LASSO in high-dimensional, sparse linear regression models with potentially heavy-tailed and heteroscedastic errors are developed. In doing so, the empirical pseudo Huber loss is considered as loss function and the main focus is sign-consistency of the resulting estimator. Simulations illustrate the favorable numerical performance of the proposed methodology in comparison to the ordinary adaptive LASSO. Subsequently, those results are applied to the linear random coefficient regression model, more precisely to the means, variances and covariances of the coefficients. Furthermore, sufficient conditions for the identifiability of the first and second moments, as well as asymptotic results for a fixed number of coefficients are given
Robust Orthogonal Complement Principal Component Analysis
Recently, the robustification of principal component analysis has attracted
lots of attention from statisticians, engineers and computer scientists. In
this work we study the type of outliers that are not necessarily apparent in
the original observation space but can seriously affect the principal subspace
estimation. Based on a mathematical formulation of such transformed outliers, a
novel robust orthogonal complement principal component analysis (ROC-PCA) is
proposed. The framework combines the popular sparsity-enforcing and low rank
regularization techniques to deal with row-wise outliers as well as
element-wise outliers. A non-asymptotic oracle inequality guarantees the
accuracy and high breakdown performance of ROC-PCA in finite samples. To tackle
the computational challenges, an efficient algorithm is developed on the basis
of Stiefel manifold optimization and iterative thresholding. Furthermore, a
batch variant is proposed to significantly reduce the cost in ultra high
dimensions. The paper also points out a pitfall of a common practice of SVD
reduction in robust PCA. Experiments show the effectiveness and efficiency of
ROC-PCA in both synthetic and real data
Recommended from our members
Distributionally Robust Performance Analysis with Applications to Mine Valuation and Risk
We consider several problems motivated by issues faced in the mining industry. In recent years, it has become clear that mines have substantial tail risk in the form of environmental disasters, and this tail risk is not incorporated into common pricing and risk models. However, data sets of the extremal climate behavior that drive this risk are very small, and generally inadequate for properly estimating the tail behavior. We propose a data-driven methodology that comes up with reasonable worst-case scenarios, given the data size constraints, and we incorporate this into a real options based model for the valuation of mines. We propose several different iterations of the model, to allow the end-user to choose the degree to which they wish to specify the financial consequences of the disaster scenario. Next, in order to perform a risk analysis on a portfolio of mines, we propose a method of estimating the correlation structure of high-dimensional max-stable processes. Using the techniques of (Liu Et al, 2017) to map the relationship between normal correlations and max-stable correlations, we can then use techniques inspired by (Bickel et al, 2008, Liu et al, 2014, Rothman et al, 2009) to estimate the underlying correlation matrix, while preserving a sparse, positive-definite structure. The correlation matrices are then used in the calculation of model-robust risk metrics (VaR, CVAR) using the the Sample-Out-of-Sample methodology (Blanchet and Kang, 2017). We conclude with several new techniques that were developed in the field of robust performance analysis, that while not directly applied to mining, were motivated by our studies into distributionally robust optimization in order to address these problems
Ambulance Emergency Response Optimization in Developing Countries
The lack of emergency medical transportation is viewed as the main barrier to
the access of emergency medical care in low and middle-income countries
(LMICs). In this paper, we present a robust optimization approach to optimize
both the location and routing of emergency response vehicles, accounting for
uncertainty in travel times and spatial demand characteristic of LMICs. We
traveled to Dhaka, Bangladesh, the sixth largest and third most densely
populated city in the world, to conduct field research resulting in the
collection of two unique datasets that inform our approach. This data is
leveraged to develop machine learning methodologies to estimate demand for
emergency medical services in a LMIC setting and to predict the travel time
between any two locations in the road network for different times of day and
days of the week. We combine our robust optimization and machine learning
frameworks with real data to provide an in-depth investigation into three
policy-related questions. First, we demonstrate that outpost locations
optimized for weekday rush hour lead to good performance for all times of day
and days of the week. Second, we find that significant improvements in
emergency response times can be achieved by re-locating a small number of
outposts and that the performance of the current system could be replicated
using only 30% of the resources. Lastly, we show that a fleet of small
motorcycle-based ambulances has the potential to significantly outperform
traditional ambulance vans. In particular, they are able to capture three times
more demand while reducing the median response time by 42% due to increased
routing flexibility offered by nimble vehicles on a larger road network. Our
results provide practical insights for emergency response optimization that can
be leveraged by hospital-based and private ambulance providers in Dhaka and
other urban centers in LMICs
Recommended from our members
Distributionally Robust Performance Analysis: Data, Dependence and Extremes
This dissertation focuses on distributionally robust performance analysis, which is an area of applied probability whose aim is to quantify the impact of model errors. Stochastic models are built to describe phenomena of interest with the intent of gaining insights or making informed decisions. Typically, however, the fidelity of these models (i.e. how closely they describe the underlying reality) may be compromised due to either the lack of information available or tractability considerations. The goal of distributionally robust performance analysis is then to quantify, and potentially mitigate, the impact of errors or model misspecifications. As such, distributionally robust performance analysis affects virtually any area in which stochastic modelling is used for analysis or decision making.
This dissertation studies various aspects of distributionally robust performance analysis. For example, we are concerned with quantifying the impact of model error in tail estimation using extreme value theory. We are also concerned with the impact of the dependence structure in risk analysis when marginal distributions of risk factors are known. In addition, we also are interested in connections recently found to machine learning and other statistical estimators which are based on distributionally robust optimization.
The first problem that we consider consists in studying the impact of model specification in the context of extreme quantiles and tail probabilities. There is a rich statistical theory that allows to extrapolate tail behavior based on limited information. This body of theory is known as extreme value theory and it has been successfully applied to a wide range of settings, including building physical infrastructure to withstand extreme environmental events and also guiding the capital requirements of insurance companies to ensure their financial solvency. Not surprisingly, attempting to extrapolate out into the tail of a distribution from limited observations requires imposing assumptions which are impossible to verify. The assumptions imposed in extreme value theory imply that a parametric family of models (known as generalized extreme value distributions) can be used to perform tail estimation. Because such assumptions are so difficult (or impossible) to be verified, we use distributionally robust optimization to enhance extreme value statistical analysis. Our approach results in a procedure which can be easily applied in conjunction with standard extreme value analysis and we show that our estimators enjoy correct coverage even in settings in which the assumptions imposed by extreme value theory fail to hold.
In addition to extreme value estimation, which is associated to risk analysis via extreme events, another feature which often plays a role in the risk analysis is the impact of dependence structure among risk factors. In the second chapter we study the question of evaluating the worst-case expected cost involving two sources of uncertainty, each of them with a specific marginal probability distribution. The worst-case expectation is optimized over all joint probability distributions which are consistent with the marginal distributions specified for each source of uncertainty. So, our formulation allows to capture the impact of the dependence structure of the risk factors. This formulation is equivalent to the so-called Monge-Kantorovich problem studied in optimal transport theory, whose theoretical properties have been studied in the literature substantially. However, rates of convergence of computational algorithms for this problem have been studied only recently. We show that if one of the random variables takes finitely many values, a direct Monte Carlo approach allows to evaluate such worst case expectation with convergence rate as the number of Monte Carlo samples, , increases to infinity.
Next, we continue our investigation of worst-case expectations in the context of multiple risk factors, not only two of them, assuming that their marginal probability distributions are fixed. This problem does not fit the mold of standard optimal transport (or Monge-Kantorovich) problems. We consider, however, cost functions which are separable in the sense of being a sum of functions which depend on adjacent pairs of risk factors (think of the factors indexed by time). In this setting, we are able to reduce the problem to the study of several separate Monge-Kantorovich problems. Moreover, we explain how we can even include martingale constraints which are often natural to consider in settings such as financial applications.
While in the previous chapters we focused on the impact of tail modeling or dependence, in the later parts of the dissertation we take a broader view by studying decisions which are made based on empirical observations. So, we focus on so-called distributionally robust optimization formulations. We use optimal transport theory to model the degree of distributional uncertainty or model misspecification. Distributionally robust optimization based on optimal transport has been a very active research topic in recent years, our contribution consists in studying how to specify the optimal transport metric in a data-driven way. We explain our procedure in the context of classification, which is of substantial importance in machine learning applications