311 research outputs found
Finding Outliers in Surface Data and Video
Surface, image and video data can be considered as functional data with a
bivariate domain. To detect outlying surfaces or images, a new method is
proposed based on the mean and the variability of the degree of outlyingness at
each grid point. A rule is constructed to flag the outliers in the resulting
functional outlier map. Heatmaps of their outlyingness indicate the regions
which are most deviating from the regular surfaces. The method is applied to
fluorescence excitation-emission spectra after fitting a PARAFAC model, to MRI
image data which are augmented with their gradients, and to video surveillance
data
The Gaussian rank correlation estimator: Robustness properties.
The Gaussian rank correlation equals the usual correlation coefficient computed from the normal scores of the data. Although its influence function is unbounded, it still has attractive robustness properties. In particular, its breakdown point is above 12%. Moreover, the estimator is consistent and asymptotically efficient at the normal distribution. The correlation matrix based on the Gaussian rank correlation is always positive semidefinite, and very easy to compute, also in high dimensions. A simulation study confirms the good efficiency and robustness properties of the proposed estimator with respect to the popular Kendall and Spearman correlation measures. In the empirical application, we show how it can be used for multivariate outlier detection based on robust principal component analysis.Breakdown; Correlation; Efficiency; Robustness; Van der Waerden;
Influence Functions of the Spearman and Kendall Correlation Measures
Mathematics Subject Classification (2000) 62G35 · 62F99
Spatial Sign Correlation
A new robust correlation estimator based on the spatial sign covariance
matrix (SSCM) is proposed. We derive its asymptotic distribution and influence
function at elliptical distributions. Finite sample and robustness properties
are studied and compared to other robust correlation estimators by means of
numerical simulations.Comment: 20 pages, 7 figures, 2 table
Robustness versus efficiency for nonparametric correlation measures.
Nonparametric correlation measures at the Kendall and Spearman correlation are widely used in the behavioral sciences. These measures are often said to be robust, in the sense of being resistant to outlying observations. In this note we formally study their robustness by means of their influence functions. Since robustness of an estimator often comes at the price of a loss inprecision, we compute efficiencies at the normal model. A comparison with robust correlation measures derived from robust covariance matrices is made. We conclude that both Spearman and Kendall correlation measures combine good robustness properties with high efficiency.asymptotic variance; correlation; gross-error sensitivity; influence function; Kendall correlation; robustness; Spearman correlation;
Outlier Detection for Multivariate Time Series: A Functional Data Approach ®
Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] A method for detecting outlier samples in a multivariate time series dataset is proposed. It is assumed that an outlying series is characterized by having been generated from a different process than those associated with the rest of the series. Each multivariate time series is described by means of an estimator of its quantile cross-spectral density, which is treated as a multivariate functional datum. Then an outlier score is assigned to each series by using functional depths. A broad simulation study shows that the proposed approach is superior to the alternatives suggested in the literature and demonstrates that the consideration of functional data constitutes a critical step. The procedure runs in linear time with respect to both the series length and the number of series, and in quadratic time with respect to the number of dimensions. Two applications concerning financial series and ECG signals highlight the usefulness of the technique.This research has been supported by the Ministerio de Economía y Competitividad (MINECO) grants MTM2017-82724-R and PID2020-113578RB-100, the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14), and the Centro de Investigación del Sistema Universitario de Galicia “CITIC” grant ED431G 2019/01; all of them through the European Regional Development Fund (ERDF).
This work has received funding for open access charge by Universidade da Coruña/CISUG .Xunta de Galicia; ED431C-2020-14Xunta de Galicia; ED431G 2019/0
Independent components techniques based on kurtosis for functional data analysis
The motivation for this paper arises from an article written by Peña et al. [40] in 2010,where they propose the eigenvectors associated with the extreme values of a kurtosismatrix as interesting directions to reveal the possible cluster structure of a dataset. In recent years many research papers have proposed generalizations of multivariatetechniques to the functional data case. In this paper we introduce an extension of themultivariate kurtosis for functional data, and we analyze some of its properties. Inparticular, we explore if our proposal preserves some of the properties of the kurtosisprocedures applied to the multivariate case, regarding the identification of outliers andcluster structures. This analysis is conducted considering both theoretical andexperimental properties of our proposa
On Language Clustering: A Non-parametric Statistical Approach
Any approach aimed at pasteurizing and quantifying a particular phenomenon
must include the use of robust statistical methodologies for data analysis.
With this in mind, the purpose of this study is to present statistical
approaches that may be employed in nonparametric nonhomogeneous data
frameworks, as well as to examine their application in the field of natural
language processing and language clustering. Furthermore, this paper discusses
the many uses of nonparametric approaches in linguistic data mining and
processing. The data depth idea allows for the centre-outward ordering of
points in any dimension, resulting in a new nonparametric multivariate
statistical analysis that does not require any distributional assumptions. The
concept of hierarchy is used in historical language categorisation and
structuring, and it aims to organise and cluster languages into subfamilies
using the same premise. In this regard, the current study presents a novel
approach to language family structuring based on non-parametric approaches
produced from a typological structure of words in various languages, which is
then converted into a Cartesian framework using MDS. This
statistical-depth-based architecture allows for the use of data-depth-based
methodologies for robust outlier detection, which is extremely useful in
understanding the categorization of diverse borderline languages and allows for
the re-evaluation of existing classification systems. Other depth-based
approaches are also applied to processes such as unsupervised and supervised
clustering. This paper therefore provides an overview of procedures that can be
applied to nonhomogeneous language classification systems in a nonparametric
framework.Comment: 18 page
- …