3,685 research outputs found

    A Parametric Framework for the Comparison of Methods of Very Robust Regression

    Full text link
    There are several methods for obtaining very robust estimates of regression parameters that asymptotically resist 50% of outliers in the data. Differences in the behaviour of these algorithms depend on the distance between the regression data and the outliers. We introduce a parameter λ\lambda that defines a parametric path in the space of models and enables us to study, in a systematic way, the properties of estimators as the groups of data move from being far apart to close together. We examine, as a function of λ\lambda, the variance and squared bias of five estimators and we also consider their power when used in the detection of outliers. This systematic approach provides tools for gaining knowledge and better understanding of the properties of robust estimators.Comment: Published in at http://dx.doi.org/10.1214/13-STS437 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A new non-parametric detector of univariate outliers for distributions with unbounded support

    Full text link
    The purpose of this paper is to construct a new non-parametric detector of univariate outliers and to study its asymptotic properties. This detector is based on a Hill's type statistic. It satisfies a unique asymptotic behavior for a large set of probability distributions with positive unbounded support (for instance: for the absolute value of Gaussian, Gamma, Weibull, Student or regular variations distributions). We have illustrated our results by numerical simulations which show the accuracy of this detector with respect to other usual univariate outlier detectors (Tukey, MAD or Local Outlier Factor detectors). The detection of outliers in a database providing the prices of used cars is also proposed as an application to real-life database

    Robust Henderson III estimators of variance components in the nested error model

    Get PDF
    Common methods for estimating variance components in Linear Mixed Models include Maximum Likelihood (ML) and Restricted Maximum Likelihood (REML). These methods are based on the strong assumption of multivariate normal distribution and it is well know that they are very sensitive to outlying observations with respect to any of the random components. Several robust altematives of these methods have been proposed (e.g. Fellner 1986, Richardson and Welsh 1995). In this work we present several robust alternatives based on the Henderson method III which do not rely on the normality assumption and provide explicit solutions for the variance components estimators. These estimators can later be used to derive robust estimators of regression coefficients. Finally, we describe an application of this procedure to small area estimation, in which the main target is the estimation of the means of areas or domains when the within-area sample sizes are small.Henderson method III, Linear mixed model, Robust estimators, Variance component estimators

    Gibbs sampling will fail in outlier problems with strong masking

    Get PDF
    This paper discusses the convergence of the Gibbs sampling algorithm when it is applied to the problem of outlier detection in regression models. Given any vector of initial conditions, theoretically, the algorithm converges to the true posterior distribution. However, the speed of convergence may slow down in a high dimensional parameter space where the parameters are highly correlated. We show that the effect of the leverage in regression models makes very difficult the convergence of the Gibbs sampling algorithm in sets of data with strong masking. The problem is illustrated in several examples

    Resistant estimates for high dimensional and functional data based on random projections

    Get PDF
    We herein propose a new robust estimation method based on random projections that is adaptive and, automatically produces a robust estimate, while enabling easy computations for high or infinite dimensional data. Under some restricted contamination models, the procedure is robust and attains full efficiency. We tested the method using both simulated and real data.Comment: 24 pages, 6 figure

    Cluster-Based Estimators For Multiple And Multivariate Linear Regression Models

    Get PDF
    Dalam bidang pemodelan regresi linear, regresi kuasa dua terkecil (LS) klasik adalah mudah dipengaruhi oleh titik terpencil manakala penganggar regresi rendah-kerosakan seperti regresi M dan regresi pengaruh terbatas mampu menahan pengaruh peratusan kecil titik terpencil. Penganggar tinggi-kerosakan seperti kuasa dua trim terkecil (LTS) dan penganggar regresi (MM) adalah teguh terhadap sebanyak 50% daripada pencemaran data. Masalah prosedur penganggar ini termasuklah permintaan pengkomputeran luas dan kebolehubahan subpensampelan, kerentanan koefisien teruk terhadap kebolehubahan kecil dalam nilai awal, sisihan dalaman daripada trend umum dan kebolehan dalam data bersih dan situasi rendah-kerosakan. Kajian ini mencadangkan suatu penganggar regresi baru yang menyelesaikan masalah dalam model regresi berganda dan regresi multivariat serta menyediakan maklumat berguna tentang kehadiran dan struktur titik terpencil multivariat. In the field of linear regression modelling, the classical least squares (LS) regression is susceptible to a single outlier whereas low-breakdown regression estimators like M regression and bounded influence regression are able to resist the influence of a small percentage of outliers. High-breakdown estimators like the least trimmed squares (LTS) and MM regression estimators are resistant to as much as 50% of data contamination. The problems with these estimation procedures include enormous computational demands and subsampling variability, severe coefficient susceptibility to very small variability in initial values, internal deviation from the general trend and capabilities in clean data and in low breakdown situations. This study proposes a new high breakdown regression estimator that addresses these problems in multiple regression and multivariate regression models as well as providing insightful information about the presence and structure of multivariate outliers

    Heterogeneity and model uncertainty in bayesian regression models

    Get PDF
    Data heterogeneity appears when the sample comes from at least two different populations. We analyze three types of situations. The first and simplest case corresponds to the situation in which the majority of the data comes form a central model and a few isolated observations comes from a contaminating distribution. Then the data from the contaminating distribution are called outliers and they have been studied in depth in the statistical literature. The second case corresponds to the situation in which we still have a central model but the heterogeneous data may appears in clusters of outliers which mask each other. This is the multiple outlier problem which is much more difficult to handle and it has understood and analyzed in the last few years. The few Bayesian contributions to this problem are presented. The third case corresponds to the situation in which we do not have a central model but instead different groups of data have been generated by different models. When the data is multivariate normal, this problem has been analyzed by mixture models under the name of cluster analysis but a challenging area of research is to develop a general methodology to apply this multiple model approach to other statistical problems. Heterogeneity implies in general an increase in the uncertainty of predictions, and in this paper a procedure to measure this effect is proposed

    Diagnostic-robust statistical analysis for Local Surface Fitting in 3D Point Cloud Data

    Get PDF
    Objectives: Surface reconstruction and fitting for geometric primitives and three Dimensional (3D) modeling is a fundamental task in the field of photogrammetry and reverse engineering. However it is impractical to get point cloud data without outliers/noise being present. The noise in the data acquisition process induces rough and uneven surfaces, and reduces the precision/accuracy of the acquired model. This paper investigates the problem of local surface reconstruction and best fitting from unorganized outlier contaminated 3D point cloud data. Methods: Least Squares (LS) method, Principal Component Analysis (PCA) and RANSAC are the three most popular techniques for fitting planar surfaces to 2D and 3D data. All three methods are affected by outliers and do not give reliable and robust parameter estimation. In the statistics literature, robust techniques and outlier diagnostics are two complementary approaches but any one alone is not sufficient for outlier detection and robust parameter estimation. We propose a diagnostic-robust statistical algorithm that uses both approaches in combination for fitting planar surfaces in the presence of outliers.Robust distance is used as a multivariate diagnostic technique for outlier detection and robust PCA is used as an outlier resistant technique for plane fitting. The robust distance is the robustification of the well-known Mohalanobis distance by using the recently introduced high breakdown Minimum Covariance Determinant (MCD) location and scatter estimates. The classical PCA measures data variability through the variance and the corresponding directions are the latent vectors which are sensitive to outlying observations. In contrast, the robust PCA which combines the 'projection pursuit' approach with a robust scatter matrix based on the MCD of the covariance matrix, is robust with outlying observations in the dataset. In addition, robust PCA produces graphical displays of orthogonal distance and score distance as the by-products which can detects outliers and aids better robust fitting by using robust PCA for a second time in the final plane fitting stage. In summary, the proposed method removes the outliers first and then fits the local surface in a robust way.Results and conclusions: We present a new diagnostic-robust statistical technique for local surface fitting in 3D point cloud data. Finally, the benefits of the new diagnostic-robust algorithm are demonstrated through an artificial dataset and several terrestrial mobile mapping laser scanning point cloud datasets. Comparative results show that the classical LS and PCA methods are very sensitive to outliers and failed to reliably fit planes. The RANSAC algorithm is not completely free from the effect of outliers and requires more processing time for large datasets. The proposed method smooths away noise and is significantly better and efficient than the other three methods for local planar surface fitting even in the presence of roughness. This method is applicable for 3D straight line fitting as well and has great potential for local normal estimation and different types of surface fitting
    corecore