60 research outputs found

    Affine-Invariant Outlier Detection and Data Visualization

    Get PDF
    A wealth of data is generated daily by social media websites that is an essential component of the Big Data Revolution. In many cases, the data is anonymized before being disseminated for research and analysis. This anonymization process distorts the data so that some essential characteristics are lost which may not be captured by methods that are not robust against such transformations. In this paper we propose novel algorithms, for two-dimensional data, for a recently discovered statistical data analysis measure, the Ray Shooting Depth (RSD) that provides an affineinvariant ranking of data points. In addition, we prove some complexity results and illustrate some of the desirable properties of RSD via comparisons with other similar notions. We develop an open-source data visualization tool based on RSD, and show its applications in distribution estimation, outlier detection, and 2D tolerance-region construction

    Directional outlyingness applied to distances between genomic words

    Get PDF
    The detection of outlier curves/images is crucial in many areas, such as environmental, meteorological, medical, or economic contexts. In the functional framework, outlying observations are not only those that contain atypically high or low values, but also curves that present a different shape or pattern from the rest of the curves in the sample. In this short paper, we mention some recent methods for outlier detection in functional data and apply a recently proposed measure, the directional outlyingness, and the functional outlier map to detect words with outlying distance distribution in the human genome.publishe

    Assessing the effects of Multivariate Functional outlier identification and sample robustification on identifying critical PM2.5 air pollution episodes in Medellín, Colombia

    Get PDF
    La identificación de datos atípicos de contaminación ambiental, tanto como un problema de identificación de atípicos como bajo los problemas de clasificación es una aplicación usual del análisis de datos funcionales multivariados. El artículo da cuenta de los efectos de la robustificación de muestras funcionales multivariadas sobre la identificación de episodios críticos de polución en Medellín, Colombia. Para hacerlo, compara 18 métodos de identificación de atípicos basados en profundidades y resalta las mejores opciones en términos de precisión a través de simulación. Después, aplica los dos métodos con mejor desempeño a la robustificación de una base de datos real de contaminación del aire en el área metropolitana de MEdellín, Colombia y compara los efectos de robustificar las muestras sobre la precisión de la clasificación supervisada. Los resultados muestran que 10 de los 20 métodos revisados se desempeñan mejor en al menos un tipo de atípicos. Sin embargo, no se evidencian resultados positivos de la robustificación en la base de datos real.Identification of critical episodes of environmental pollution, both as a outlier identification problem and as a classification problem, is a usual application of multivariate functional data analysis. This article addresses the effects of robustifying multivariate functional samples on the identification of critical pollution episodes in Medellín, Colombia. To do so, it compares 18 depth-based outlier identification methods and highlights the best options in terms of precision through simulation. It then applies the two methods with the best performance to robustify a real dataset of air pollution (PM2.5 concentration) in the Metropolitan Area of Medellín, Colombia and compares the effects of robustifying the samples on the accuracy of supervised classification through the multivariate functional DD-classifier. Our results show that 10 out of 20 methods revised perform better in at least one kind outliers. Nevertheless, no clear positive effects of robustification were identified with the real dataset

    Depthgram: Visualizing outliers in high-dimensional functional data with application to fMRI data exploration.

    Get PDF
    Functional magnetic resonance imaging (fMRI) is a non-invasive technique that facilitates the study of brain activity by measuring changes in blood flow. Brain activity signals can be recorded during the alternate performance of given tasks, that is, task fMRI (tfMRI), or during resting-state, that is, resting-state fMRI (rsfMRI), as a measure of baseline brain activity. This contributes to the understanding of how the human brain is organized in functionally distinct subdivisions. fMRI experiments from high-resolution scans provide hundred of thousands of longitudinal signals for each individual, corresponding to brain activity measurements over each voxel of the brain along the duration of the experiment. In this context, we propose novel visualization techniques for high-dimensional functional data relying on depth-based notions that enable computationally efficient 2-dim representations of fMRI data, which elucidate sample composition, outlier presence, and individual variability. We believe that this previous step is crucial to any inferential approach willing to identify neuroscientific patterns across individuals, tasks, and brain regions. We present the proposed technique via an extensive simulation study, and demonstrate its application on a motor and language tfMRI experiment.Agencia Estatal de Investigación, Spain, Grant/Award Number: PID2019-109196GB-I00; Ministerio de Economía y Competitividad, Spain, Grant/Award Numbers: ECO2015-66593-P, MTM2014-56535-R.S
    corecore