534 research outputs found

    Fast robust correlation for high-dimensional data

    Full text link
    The product moment covariance is a cornerstone of multivariate data analysis, from which one can derive correlations, principal components, Mahalanobis distances and many other results. Unfortunately the product moment covariance and the corresponding Pearson correlation are very susceptible to outliers (anomalies) in the data. Several robust measures of covariance have been developed, but few are suitable for the ultrahigh dimensional data that are becoming more prevalent nowadays. For that one needs methods whose computation scales well with the dimension, are guaranteed to yield a positive semidefinite covariance matrix, and are sufficiently robust to outliers as well as sufficiently accurate in the statistical sense of low variability. We construct such methods using data transformations. The resulting approach is simple, fast and widely applicable. We study its robustness by deriving influence functions and breakdown values, and computing the mean squared error on contaminated data. Using these results we select a method that performs well overall. This also allows us to construct a faster version of the DetectDeviatingCells method (Rousseeuw and Van den Bossche, 2018) to detect cellwise outliers, that can deal with much higher dimensions. The approach is illustrated on genomic data with 12,000 variables and color video data with 920,000 dimensions

    Mature Cooperative Groups Seeking New Identities: The Case of Belgium

    Get PDF
    The cooperative sector in Belgium has always been very much linked to other social movements. In the 1990s the backbone of the sector, namely the cooperative banks, have undergone major transformations. In this article, the two most important cooperative financial holdings that were created to replace the stand-alone cooperative banks are looked at: the Cera and the ARCO-group. We see that they follow a similar path but have opted for a slightly different positioning in the Belgian social and economic landscape. Both have sought a new identity by repositioning themselves vis-à-vis the market, civil society and the state. The consequences of the new “cooperative trilemma†are gradually becoming clear.cooperative, social movement, sustainable development, corporate social responsibility, cooperative trilemma, Agribusiness,

    Discussion of "The power of monitoring"

    Get PDF
    This is an invited comment on the discussion paper "The power of monitoring: how to make the most of a contaminated multivariate sample" by A. Cerioli, M. Riani, A. Atkinson and A. Corbellini that will appear in the journal Statistical Methods & Applications

    Finding Outliers in Surface Data and Video

    Full text link
    Surface, image and video data can be considered as functional data with a bivariate domain. To detect outlying surfaces or images, a new method is proposed based on the mean and the variability of the degree of outlyingness at each grid point. A rule is constructed to flag the outliers in the resulting functional outlier map. Heatmaps of their outlyingness indicate the regions which are most deviating from the regular surfaces. The method is applied to fluorescence excitation-emission spectra after fitting a PARAFAC model, to MRI image data which are augmented with their gradients, and to video surveillance data

    A generalized spatial sign covariance matrix

    Full text link
    The well-known spatial sign covariance matrix (SSCM) carries out a radial transform which moves all data points to a sphere, followed by computing the classical covariance matrix of the transformed data. Its popularity stems from its robustness to outliers, fast computation, and applications to correlation and principal component analysis. In this paper we study more general radial functions. It is shown that the eigenvectors of the generalized SSCM are still consistent and the ranks of the eigenvalues are preserved. The influence function of the resulting scatter matrix is derived, and it is shown that its breakdown value is as high as that of the original SSCM. A simulation study indicates that the best results are obtained when the inner half of the data points are not transformed and points lying far away are moved to the center

    Equivariant Passing-Bablok regression in quasilinear time

    Get PDF
    Passing-Bablok regression is a standard tool for method and assay comparison studies thanks to its place in industry guidelines such as CLSI. Unfortunately, its computational cost is high as a naive approach requires O(n2) time. This makes it impossible to compute the Passing-Bablok regression estimator on large datasets. Additionally, even on smaller datasets it can be difficult to perform bootstrap-based inference. We introduce the first quasilinear time algorithm for the equivariant Passing-Bablok estimator. In contrast to the naive algorithm, our algorithm runs in O(n log(n)) expected time using O(n) space, allowing for its application to much larger data sets. Additionally, we introduce a fast estimator for the variance of the Passing-Bablok slope and discuss statistical inference based on bootstrap and this variance estimate. Finally, we propose a diagnostic plot to identify influential points in Passing-Bablok regression. The superior performance of the proposed methods is illustrated on real data examples of clinical method comparison studies

    Nudging people to pay their parking fines on time. Evidence from a cluster-randomized field experiment

    Get PDF
    The timely payment of municipal parking fines signifies people's acceptance of parking regulations, reduces administrative enforcement costs, and prevents additional late-payment fees for individuals. However, public administrations face challenges in enforcing the timely payment of parking fines. A large group of people fail to pay their fines on time, which requires additional enforcement actions that can result in extra late-payment costs and payment-related stress. In this study we collaborate with the Belgian city of Mechelen and the Behavioral Insights Team of the Flemish regional government to test the compounded effects of three communicative nudges. i.e., simplification, explicit penalty, and social norm, on the timely payment of parking fines. In a cluster-randomized field experiment, parking offenders received either the original notification letter, a simplified notification letter, a simplified notification letter accompanied by an explicit reference to the potential penalties, or a simplified notification letter accompanied by an explicit penalty and a social norm message. The results indicate that people can be nudged to pay their fines on time, but only when multiple nudges are combined and used simultaneously.</p
    corecore