6,148 research outputs found

    Evolutionary algorithms for robust methods

    Get PDF
    A drawback of robust statistical techniques is the increased computational effort often needed compared to non robust methods. Robust estimators possessing the exact fit property, for example, are NP-hard to compute. This means thatunder the widely believed assumption that the computational complexity classes NP and P are not equalthere is no hope to compute exact solutions for large high dimensional data sets. To tackle this problem, search heuristics are used to compute NP-hard estimators in high dimensions. Here, an evolutionary algorithm that is applicable to different robust estimators is presented. Further, variants of this evolutionary algorithm for selected estimatorsmost prominently least trimmed squares and least median of squaresare introduced and shown to outperform existing popular search heuristics in difficult data situations. The results increase the applicability of robust methods and underline the usefulness of evolutionary computation for computational statistics. --Evolutionary algorithms,robust regression,least trimmed squares (LTS),least median of squares (LMS),least quantile of squares (LQS),least quartile difference (LQD)

    Least quantile regression via modern optimization

    Get PDF
    We address the Least Quantile of Squares (LQS) (and in particular the Least Median of Squares) regression problem using modern optimization methods. We propose a Mixed Integer Optimization (MIO) formulation of the LQS problem which allows us to find a provably global optimal solution for the LQS problem. Our MIO framework has the appealing characteristic that if we terminate the algorithm early, we obtain a solution with a guarantee on its sub-optimality. We also propose continuous optimization methods based on first-order subdifferential methods, sequential linear optimization and hybrid combinations of them to obtain near optimal solutions to the LQS problem. The MIO algorithm is found to benefit significantly from high quality solutions delivered by our continuous optimization based methods. We further show that the MIO approach leads to (a) an optimal solution for any dataset, where the data-points (yi,xi)(y_i,\mathbf{x}_i)'s are not necessarily in general position, (b) a simple proof of the breakdown point of the LQS objective value that holds for any dataset and (c) an extension to situations where there are polyhedral constraints on the regression coefficient vector. We report computational results with both synthetic and real-world datasets showing that the MIO algorithm with warm starts from the continuous optimization methods solve small (n=100n=100) and medium (n=500n=500) size problems to provable optimality in under two hours, and outperform all publicly available methods for large-scale (n=n={}10,000) LQS problems.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1223 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    SOCP relaxation bounds for the optimal subset selection problem applied to robust linear regression

    Full text link
    This paper deals with the problem of finding the globally optimal subset of h elements from a larger set of n elements in d space dimensions so as to minimize a quadratic criterion, with an special emphasis on applications to computing the Least Trimmed Squares Estimator (LTSE) for robust regression. The computation of the LTSE is a challenging subset selection problem involving a nonlinear program with continuous and binary variables, linked in a highly nonlinear fashion. The selection of a globally optimal subset using the branch and bound (BB) algorithm is limited to problems in very low dimension, tipically d<5, as the complexity of the problem increases exponentially with d. We introduce a bold pruning strategy in the BB algorithm that results in a significant reduction in computing time, at the price of a negligeable accuracy lost. The novelty of our algorithm is that the bounds at nodes of the BB tree come from pseudo-convexifications derived using a linearization technique with approximate bounds for the nonlinear terms. The approximate bounds are computed solving an auxiliary semidefinite optimization problem. We show through a computational study that our algorithm performs well in a wide set of the most difficult instances of the LTSE problem.Comment: 12 pages, 3 figures, 2 table

    Evolutionary algorithms for robust methods

    Get PDF
    A drawback of robust statistical techniques is the increased computational effort often needed compared to non robust methods. Robust estimators possessing the exact fit property, for example, are NP-hard to compute. This means that — under the widely believed assumption that the computational complexity classes NP and P are not equal — there is no hope to compute exact solutions for large high dimensional data sets. To tackle this problem, search heuristics are used to compute NP-hard estimators in high dimensions. Here, an evolutionary algorithm that is applicable to different robust estimators is presented. Further, variants of this evolutionary algorithm for selected estimators — most prominently least trimmed squares and least median of squares—are introduced and shown to outperform existing popular search heuristics in difficult data situations. The results increase the applicability of robust methods and underline the usefulness of evolutionary computation for computational statistics

    Robust regression with optimisation heuristics

    Get PDF
    Linear regression is widely-used in finance. While the standard method to obtain parameter estimates, Least Squares, has very appealing theoretical and numerical properties, obtained estimates are often unstable in the presence of extreme observations which are rather common in financial time series. One approach to deal with such extreme observations is the application of robust or resistant estimators, like Least Quantile of Squares estimators. Unfortunately, for many such alternative approaches, the estimation is much more difficult than in the Least Squares case, as the objective function is not convex and often has many local optima. We apply different heuristic methods like Differential Evolution, Particle Swarm and Threshold Accepting to obtain parameter estimates. Particular emphasis is put on the convergence properties of these techniques for fixed computational resources, and the techniques’ sensitivity for different parameter settings.Optimisation heuristics, Robust Regression, Least Median of Squares

    On the implementation of LIR: the case of simple linear regression with interval data

    Get PDF
    This paper considers the problem of simple linear regression with interval-censored data. That is, n pairs of intervals are observed instead of the n pairs of precise values for the two variables (dependent and independent). Each of these intervals is closed but possibly unbounded, and contains the corresponding (unobserved) value of the dependent or independent variable. The goal of the regression is to describe the relationship between (the precise values of) these two variables by means of a linear function. Likelihood-based Imprecise Regression (LIR) is a recently introduced, very general approach to regression for imprecisely observed quantities. The result of a LIR analysis is in general set-valued: it consists of all regression functions that cannot be excluded on the basis of likelihood inference. These regression functions are said to be undominated. Since the interval data can be unbounded, a robust regression method is necessary. Hence, we consider the robust LIR method based on the minimization of the residuals' quantiles. For this method, we prove that the set of all the intercept-slope pairs corresponding to the undominated regression functions is the union of finitely many polygons. We give an exact algorithm for determining this set (i.e., for determining the set-valued result of the robust LIR analysis), and show that it has worst-case time complexity O(n^3 log n). We have implemented this exact algorithm as part of the R package linLIR

    Determinants of Long-term Growth: New Results Applying Roboust Estimation and Extreme Bounds

    Get PDF
    sensitivity analysis, outliers, economic growth, robust estimation

    A procedure for robust estimation and diagnostics in regression

    Get PDF
    We propose a new procedure for computing an approximation to regression estimates based on the minimization of a robust scale. The procedure can be applied with a large number of independent variables where the usual methods based on resampling require an unfeasible or extremely costly computer time. An important advantage of the procedure is that it can be incorporated in any high breakdown procedure and improve it with just a few seconds of computer time. The procedure minimizes the robust scale over a set of tentative parameter vectors. Each of these parameter vector is obtained as follows. We represent each data point by the vector of changes of the least squares forecasts of that observation, when each of the observations is deleted. Then the sets of possible outliers are obtained as the extreme points of the principal components of these vectors, or as the set of points with large residuals. The good performance of the procedure allows the identification of multiple outliers avoiding masking effects. The efficiency of the procedure for robust estimation and its power as an outlier detection tool are investigated in a simulation study and some examples
