7 research outputs found

    Discussion of "The power of monitoring"

    Get PDF
    This is an invited comment on the discussion paper "The power of monitoring: how to make the most of a contaminated multivariate sample" by A. Cerioli, M. Riani, A. Atkinson and A. Corbellini that will appear in the journal Statistical Methods & Applications

    Real-time outlier detection for large datasets by RT-DetMCD

    Get PDF
    Modern industrial machines can generate gigabytes of data in seconds, frequently pushing the boundaries of available computing power. Together with the time criticality of industrial processing this presents a challenging problem for any data analytics procedure. We focus on the deterministic minimum covariance determinant method (DetMCD), which detects outliers by fitting a robust covariance matrix. We construct a much faster version of DetMCD by replacing its initial estimators by two new methods and incorporating update-based concentration steps. The computation time is reduced further by parallel computing, with a novel robust aggregation method to combine the results from the threads. The speed and accuracy of the proposed real-time DetMCD method (RT-DetMCD) are illustrated by simulation and a real industrial application to food sorting

    Real-time discriminant analysis in the presence of label and measurement noise

    Get PDF
    Quadratic discriminant analysis (QDA) is a widely used classification technique. Based on a training dataset, each class in the data is characterized by an estimate of its center and shape, which can then be used to assign unseen observations to one of the classes. The traditional QDA rule relies on the empirical mean and covariance matrix. Unfortunately, these estimators are sensitive to label and measurement noise which often impairs the model's predictive ability. Robust estimators of location and scatter are resistant to this type of contamination. However, they have a prohibitive computational cost for large scale industrial experiments. We present a novel QDA method based on a recent real-time robust algorithm. We additionally integrate an anomaly detection step to classify the most atypical observations into a separate class of outliers. Finally, we introduce the label bias plot, a graphical display to identify label and measurement noise in the training data. The performance of the proposed approach is illustrated in a simulation study with huge datasets, and on real datasets about diabetes and fruit

    Development of real-time, robust statistical methods for novel applications in food sorting

    No full text
    In industrial food sorting, fast sensor based technologies are used for automated food inspection. These sensors typically produce multivariate data that are used as input for classification algorithms, which are responsible for the detection of commonly found defects among the regular material. Typically, huge amounts of product are scanned in an automated fashion. Food inspection machines therefore generate gigabytes of multivariate data in milliseconds, frequently pushing the boundaries of available computing power. Outliers can dramatically influence the prediction efficiency of traditional classifiers. Robust algorithms are thus an absolute must, since industrial datasets are typically corrupted by outliers in the form of label and measurement noise. However, none of the well-known high breakdown methods can handle the sheer volume of data from these machines. This thesis addresses this problem by the introduction of new robust statistical procedures which are fast to compute, and which are specifically designed for robust outlier detection and multiclass classification problems. This doctoral thesis contains four chapters, where the relation between the different outlier detection techniques is discussed in the first chapter. The second chapter focusses on the speed-up of the deterministic minimum covariance determinant method (DetMCD), which detects outliers by fitting a robust covariance matrix. We construct a much faster version of DetMCD by replacing its initial estimators by two new methods and by incorporating update-based concentration steps. The computation time is reduced further by parallel computing, requiring the development of a novel robust aggregation method to combine the results from the individual threads. In the third chapter, we integrate the real-time DetMCD method into quadratic discriminant analysis (QDA), which is a widely used classification technique. This allows us to solve classification problems with multiple classes. Based on a training dataset, each class in the data is characterized by an estimate of its center and shape, which can then be used to assign unseen observations to one of the classes. We present a novel, robust QDA method where we additionally integrate an anomaly detection step to classify the most suspicious observations into a separate class of outliers. We also introduce the label bias plot, a graphical display to identify label and measurement noise in the training data. However, most outlier detection techniques assume that the non-outlying observations are roughly elliptically distributed, but many datasets are not of that form. Moreover, their computation time increases substantially when the number of variables goes up. In Chapter 4 we therefore propose the Kernel Minimum Regularized Covariance Determinant (KMRCD) estimator in Chapter four which addresses both issues. It is not restricted to elliptical data because it implicitly computes robust covariances in a kernel-induced feature space. A fast algorithm is constructed that starts from kernel-based initial estimates, where the kernel trick is exploited to speed up the subsequent computations.status: accepte

    Discussion of "The power of monitoring: how to make the most of a contaminated multivariate sample"

    No full text
    © 2018, Springer-Verlag GmbH Germany, part of Springer Nature. In this comment on the discussion paper “The power of monitoring: how to make the most of a contaminated multivariate sample” by A. Cerioli, M. Riani, A. Atkinson and A. Corbellini, we describe how the hard rejection property of the MCD method can be mimicked by an S-estimator with appropriate rho-function. We also point the reader to fast and deterministic algorithms for the MCD, S- and MM-estimators that are specifically suited for monitoring experiments. They were made available a few years ago and successfully used for monitoring in our papers. Finally, the question is raised how monitoring can be applied or extended for increasing numbers of cases, variables and tuning parameters.status: publishe

    Real-time outlier detection for large datasets by RT-DetMCD

    No full text
    status: publishe
    corecore