7 research outputs found

    An adjusted boxplot for skewed distributions

    No full text
    The boxplot is a very popular graphical tool for visualizing the distribution of continuous unimodal data. It shows information about the location, spread, skewness as well as the tails of the data. However, when the data are skewed, usually many points exceed the whiskers and are often erroneously declared as outliers. An adjustment of the boxplot is presented that includes a robust measure of skewness in the determination of the whiskers. This results in a more accurate representation of the data and of possible outliers. Consequently, this adjusted boxplot can also be used as a fast and automatic outlier detection tool without making any parametric assumption about the distribution of the bulk of the data. Several examples and simulation results show the advantages of this new procedure.

    A Stahel-Donoho estimator based on huberized outlyingness

    Get PDF
    The Stahel-Donoho estimator is dened as a weighted mean and covariance, where the weight of each observation depends on a measure of its outlyingness. In high dimensions, it can easily happen that an amount of outlying measure- ments is present in such a way that the majority of the observations is contami- nated in at least one of its components. In these situations, the Stahel-Donoho estimator has diculties in identifying the actual outlyingness of the contami- nated observations. An adaptation of the Stahel-Donoho estimator is presented where the data are huberized before the outlyingness is computed. It is shown that the huberized outlyingness better re ects the actual outlyingness of each observation towards the non-contaminated observations. Therefore, the result- ing adapted Stahel-Donoho estimator can better withstand large amounts of outliers. It is demonstrated that the Stahel-Donoho estimator based on huber- ized outlyingness works especially well when the data are heavily contaminated.status: publishe

    Stahel-Donoho estimators with cellwise weights

    No full text
    The Stahel-Donoho estimator is defined as a weighted mean and covariance, where each observation receives a weight which depends on a measure of its outlyingness. Therefore, all variables are treated in the same way whether they are responsible for the outlyingness or not. We present an adaptation of the Stahel-Donoho estimator, where we allow separate weights for each variable. By using cellwise weights, we aim to only downweight the contaminated variables such that we avoid losing the information contained in the other variables. The goal is to increase the precision and possibly the robustness, of the estimator. We compare several variants of our proposal and show to what extent they succeed in identifying and downweighting precisely those variables which are contaminated. We further demonstrate that in many situations the mean-squared error of the adapted estimators is lower than that of the original Stahel-Donoho estimator and that this results in better outlier detection capabilities. We also consider some real data examples
    corecore