311 research outputs found

    Finding Outliers in Surface Data and Video

    Full text link
    Surface, image and video data can be considered as functional data with a bivariate domain. To detect outlying surfaces or images, a new method is proposed based on the mean and the variability of the degree of outlyingness at each grid point. A rule is constructed to flag the outliers in the resulting functional outlier map. Heatmaps of their outlyingness indicate the regions which are most deviating from the regular surfaces. The method is applied to fluorescence excitation-emission spectra after fitting a PARAFAC model, to MRI image data which are augmented with their gradients, and to video surveillance data

    The Gaussian rank correlation estimator: Robustness properties.

    Get PDF
    The Gaussian rank correlation equals the usual correlation coefficient computed from the normal scores of the data. Although its influence function is unbounded, it still has attractive robustness properties. In particular, its breakdown point is above 12%. Moreover, the estimator is consistent and asymptotically efficient at the normal distribution. The correlation matrix based on the Gaussian rank correlation is always positive semidefinite, and very easy to compute, also in high dimensions. A simulation study confirms the good efficiency and robustness properties of the proposed estimator with respect to the popular Kendall and Spearman correlation measures. In the empirical application, we show how it can be used for multivariate outlier detection based on robust principal component analysis.Breakdown; Correlation; Efficiency; Robustness; Van der Waerden;

    Influence Functions of the Spearman and Kendall Correlation Measures

    Get PDF
    Mathematics Subject Classification (2000) 62G35 · 62F99

    Spatial Sign Correlation

    Get PDF
    A new robust correlation estimator based on the spatial sign covariance matrix (SSCM) is proposed. We derive its asymptotic distribution and influence function at elliptical distributions. Finite sample and robustness properties are studied and compared to other robust correlation estimators by means of numerical simulations.Comment: 20 pages, 7 figures, 2 table

    Robustness versus efficiency for nonparametric correlation measures.

    Get PDF
    Nonparametric correlation measures at the Kendall and Spearman correlation are widely used in the behavioral sciences. These measures are often said to be robust, in the sense of being resistant to outlying observations. In this note we formally study their robustness by means of their influence functions. Since robustness of an estimator often comes at the price of a loss inprecision, we compute efficiencies at the normal model. A comparison with robust correlation measures derived from robust covariance matrices is made. We conclude that both Spearman and Kendall correlation measures combine good robustness properties with high efficiency.asymptotic variance; correlation; gross-error sensitivity; influence function; Kendall correlation; robustness; Spearman correlation;

    Outlier Detection for Multivariate Time Series: A Functional Data Approach ®

    Get PDF
    Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] A method for detecting outlier samples in a multivariate time series dataset is proposed. It is assumed that an outlying series is characterized by having been generated from a different process than those associated with the rest of the series. Each multivariate time series is described by means of an estimator of its quantile cross-spectral density, which is treated as a multivariate functional datum. Then an outlier score is assigned to each series by using functional depths. A broad simulation study shows that the proposed approach is superior to the alternatives suggested in the literature and demonstrates that the consideration of functional data constitutes a critical step. The procedure runs in linear time with respect to both the series length and the number of series, and in quadratic time with respect to the number of dimensions. Two applications concerning financial series and ECG signals highlight the usefulness of the technique.This research has been supported by the Ministerio de Economía y Competitividad (MINECO) grants MTM2017-82724-R and PID2020-113578RB-100, the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14), and the Centro de Investigación del Sistema Universitario de Galicia “CITIC” grant ED431G 2019/01; all of them through the European Regional Development Fund (ERDF). This work has received funding for open access charge by Universidade da Coruña/CISUG .Xunta de Galicia; ED431C-2020-14Xunta de Galicia; ED431G 2019/0

    Independent components techniques based on kurtosis for functional data analysis

    Get PDF
    The motivation for this paper arises from an article written by Peña et al. [40] in 2010,where they propose the eigenvectors associated with the extreme values of a kurtosismatrix as interesting directions to reveal the possible cluster structure of a dataset. In recent years many research papers have proposed generalizations of multivariatetechniques to the functional data case. In this paper we introduce an extension of themultivariate kurtosis for functional data, and we analyze some of its properties. Inparticular, we explore if our proposal preserves some of the properties of the kurtosisprocedures applied to the multivariate case, regarding the identification of outliers andcluster structures. This analysis is conducted considering both theoretical andexperimental properties of our proposa

    On Language Clustering: A Non-parametric Statistical Approach

    Full text link
    Any approach aimed at pasteurizing and quantifying a particular phenomenon must include the use of robust statistical methodologies for data analysis. With this in mind, the purpose of this study is to present statistical approaches that may be employed in nonparametric nonhomogeneous data frameworks, as well as to examine their application in the field of natural language processing and language clustering. Furthermore, this paper discusses the many uses of nonparametric approaches in linguistic data mining and processing. The data depth idea allows for the centre-outward ordering of points in any dimension, resulting in a new nonparametric multivariate statistical analysis that does not require any distributional assumptions. The concept of hierarchy is used in historical language categorisation and structuring, and it aims to organise and cluster languages into subfamilies using the same premise. In this regard, the current study presents a novel approach to language family structuring based on non-parametric approaches produced from a typological structure of words in various languages, which is then converted into a Cartesian framework using MDS. This statistical-depth-based architecture allows for the use of data-depth-based methodologies for robust outlier detection, which is extremely useful in understanding the categorization of diverse borderline languages and allows for the re-evaluation of existing classification systems. Other depth-based approaches are also applied to processes such as unsupervised and supervised clustering. This paper therefore provides an overview of procedures that can be applied to nonhomogeneous language classification systems in a nonparametric framework.Comment: 18 page
    corecore