36 research outputs found

    Affine-Invariant Outlier Detection and Data Visualization

    Get PDF
    A wealth of data is generated daily by social media websites that is an essential component of the Big Data Revolution. In many cases, the data is anonymized before being disseminated for research and analysis. This anonymization process distorts the data so that some essential characteristics are lost which may not be captured by methods that are not robust against such transformations. In this paper we propose novel algorithms, for two-dimensional data, for a recently discovered statistical data analysis measure, the Ray Shooting Depth (RSD) that provides an affineinvariant ranking of data points. In addition, we prove some complexity results and illustrate some of the desirable properties of RSD via comparisons with other similar notions. We develop an open-source data visualization tool based on RSD, and show its applications in distribution estimation, outlier detection, and 2D tolerance-region construction

    Assessing the effects of Multivariate Functional outlier identification and sample robustification on identifying critical PM2.5 air pollution episodes in Medellín, Colombia

    Get PDF
    La identificación de datos atípicos de contaminación ambiental, tanto como un problema de identificación de atípicos como bajo los problemas de clasificación es una aplicación usual del análisis de datos funcionales multivariados. El artículo da cuenta de los efectos de la robustificación de muestras funcionales multivariadas sobre la identificación de episodios críticos de polución en Medellín, Colombia. Para hacerlo, compara 18 métodos de identificación de atípicos basados en profundidades y resalta las mejores opciones en términos de precisión a través de simulación. Después, aplica los dos métodos con mejor desempeño a la robustificación de una base de datos real de contaminación del aire en el área metropolitana de MEdellín, Colombia y compara los efectos de robustificar las muestras sobre la precisión de la clasificación supervisada. Los resultados muestran que 10 de los 20 métodos revisados se desempeñan mejor en al menos un tipo de atípicos. Sin embargo, no se evidencian resultados positivos de la robustificación en la base de datos real.Identification of critical episodes of environmental pollution, both as a outlier identification problem and as a classification problem, is a usual application of multivariate functional data analysis. This article addresses the effects of robustifying multivariate functional samples on the identification of critical pollution episodes in Medellín, Colombia. To do so, it compares 18 depth-based outlier identification methods and highlights the best options in terms of precision through simulation. It then applies the two methods with the best performance to robustify a real dataset of air pollution (PM2.5 concentration) in the Metropolitan Area of Medellín, Colombia and compares the effects of robustifying the samples on the accuracy of supervised classification through the multivariate functional DD-classifier. Our results show that 10 out of 20 methods revised perform better in at least one kind outliers. Nevertheless, no clear positive effects of robustification were identified with the real dataset

    Depthgram: Visualizing outliers in high-dimensional functional data with application to fMRI data exploration.

    Get PDF
    Functional magnetic resonance imaging (fMRI) is a non-invasive technique that facilitates the study of brain activity by measuring changes in blood flow. Brain activity signals can be recorded during the alternate performance of given tasks, that is, task fMRI (tfMRI), or during resting-state, that is, resting-state fMRI (rsfMRI), as a measure of baseline brain activity. This contributes to the understanding of how the human brain is organized in functionally distinct subdivisions. fMRI experiments from high-resolution scans provide hundred of thousands of longitudinal signals for each individual, corresponding to brain activity measurements over each voxel of the brain along the duration of the experiment. In this context, we propose novel visualization techniques for high-dimensional functional data relying on depth-based notions that enable computationally efficient 2-dim representations of fMRI data, which elucidate sample composition, outlier presence, and individual variability. We believe that this previous step is crucial to any inferential approach willing to identify neuroscientific patterns across individuals, tasks, and brain regions. We present the proposed technique via an extensive simulation study, and demonstrate its application on a motor and language tfMRI experiment.Agencia Estatal de Investigación, Spain, Grant/Award Number: PID2019-109196GB-I00; Ministerio de Economía y Competitividad, Spain, Grant/Award Numbers: ECO2015-66593-P, MTM2014-56535-R.S

    Cluster analysis with cellwise trimming and applications to robust clustering of curves

    Get PDF
    In this work, we propose a robust Cluster Analysis methodology based on cell trimming as an extension to a recently introduced robust version of Principal Component Analysis. This new approach allows for cellwise trimming in cluster analysis, which is more reasonable than traditional casewise trimming when the problem's dimension is large. This type of trimming avoids an unnecessary loss of information when only a few cells of the entirely trimmed observations are atypical. An algorithm is proposed to apply this approach. This algorithm is particularized to the interesting case of functional cluster analysis. Simulations and applications to real data sets are given to illustrate the proposed methods.This research was partially supported by Spanish Ministerio de Economía y Competitividad, Grant MTM2017- 86061-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León and FEDER, Grant VA005P17 and VA002G18
    corecore