945 research outputs found

    An Object-Oriented Framework for Robust Multivariate Analysis

    Get PDF
    Taking advantage of the S4 class system of the programming environment R, which facilitates the creation and maintenance of reusable and modular components, an object-oriented framework for robust multivariate analysis was developed. The framework resides in the packages robustbase and rrcov and includes an almost complete set of algorithms for computing robust multivariate location and scatter, various robust methods for principal component analysis as well as robust linear and quadratic discriminant analysis. The design of these methods follows common patterns which we call statistical design patterns in analogy to the design patterns widely used in software engineering. The application of the framework to data analysis as well as possible extensions by the development of new methods is demonstrated on examples which themselves are part of the package rrcov.

    An Object-Oriented Framework for Statistical Simulation: The R Package simFrame

    Get PDF
    Simulation studies are widely used by statisticians to gain insight into the quality of developed methods. Usually some guidelines regarding, e.g., simulation designs, contamination, missing data models or evaluation criteria are necessary in order to draw meaningful conclusions. The R package simFrame is an object-oriented framework for statistical simulation, which allows researchers to make use of a wide range of simulation designs with a minimal effort of programming. Its object-oriented implementation provides clear interfaces for extensions by the user. Since statistical simulation is an embarrassingly parallel process, the framework supports parallel computing to increase computational performance. Furthermore, an appropriate plot method is selected automatically depending on the structure of the simulation results. In this paper, the implementation of simFrame is discussed in great detail and the functionality of the framework is demonstrated in examples for different simulation designs.

    Classification efficiencies for robust linear discriminant analysis.

    Get PDF
    Linear discriminant analysis is typically carried out using Fisher’s method. This method relies on the sample averages and covariance matrices computed from the different groups constituting the training sample. Since sample averages and covariance matrices are not robust, it has been proposed to use robust estimators of location and covariance instead, yielding a robust version of Fisher’s method. In this paper relative classification efficiencies of the robust procedures with respect to the classical method are computed. Second order influence functions appear to be useful for computing these classification efficiencies. It turns out that, when using an appropriate robust estimator, the loss in classification efficiency at the normal model remains limited. These findings are confirmed by finite sample simulations.Classification efficiency; Discriminant analysis; Error rate; Fisher rule; Influence function; Robustness;

    Robust linear discriminant analysis for multiple groups: influence and classification efficiencies.

    Get PDF
    Linear discriminant analysis for multiple groups is typically carried out using Fisher's method. This method relies on the sample averages and covariance ma- trices computed from the different groups constituting the training sample. Since sample averages and covariance matrices are not robust, it is proposed to use robust estimators of location and covariance instead, yielding a robust version of Fisher's method. In this paper expressions are derived for the influence that an observation in the training set has on the error rate of the Fisher method for multiple linear discriminant analysis. These influence functions on the error rate turn out to be unbounded for the classical rule, but bounded when using a robust approach. Using these influence functions, we compute relative classification efficiencies of the robust procedures with respect to the classical method. It is shown that, by using an appropriate robust estimator, the loss in classification efficiency at the normal model remains limited. These findings are confirmed by finite sample simulations.Classification; Covariance; Discriminant analysis; Efficiency; Error rate; Estimator; Fisher rule; Functions; Influence function; Model; Multiple groups; Research; Robustness; Simulation; Training;

    Algorithms for Projection - Pursuit robust principal component analysis.

    Get PDF
    The results of a standard principal component analysis (PCA) can be affected by the presence of outliers. Hence robust alternatives to PCA are needed. One of the most appealing robust methods for principal component analysis uses the Projection-Pursuit principle. Here, one projects the data on a lower-dimensional space such that a robust measure of variance of the projected data will be maximized. The Projection-Pursuit-based method for principal component analysis has recently been introduced in the field of chemometrics, where the number of variables is typically large. In this paper, it is shown that the currently available algorithm for robust Projection-Pursuit PCA performs poor in the presence of many variables. A new algorithm is proposed that is more suitable for the analysis of chemical data. Its performance is studied by means of simulation experiments and illustrated on some real data sets. (c) 2007 Elsevier B.V. All rights reserved.multivariate statistics; optimization; numerical precision; outliers; robustness; scale estimators; estimators; regression;

    Robust sparse principal component analysis.

    Get PDF
    A method for principal component analysis is proposed that is sparse and robust at the same time. The sparsity delivers principal components that have loadings on a small number of variables, making them easier to interpret. The robustness makes the analysis resistant to outlying observations. The principal components correspond to directions that maximize a robust measure of the variance, with an additional penalty term to take sparseness into account. We propose an algorithm to compute the sparse and robust principal components. The method is applied on several real data examples, and diagnostic plots for detecting outliers and for selecting the degree of sparsity are provided. A simulation experiment studies the loss in statistical efficiency by requiring both robustness and sparsity.Dispersion measure; Projection-pursuit; Outliers; Variable selection;
    corecore