62,150 research outputs found
A survey of outlier detection methodologies
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review
Kernel Ellipsoidal Trimming
Ellipsoid estimation is an issue of primary importance in many practical areas such as control, system identification, visual/audio tracking, experimental design, data mining, robust statistics and novelty/outlier detection. This paper presents a new method of kernel information matrix ellipsoid estimation (KIMEE) that finds an ellipsoid in a kernel defined feature space based on a centered information matrix. Although the method is very general and can be applied to many of the aforementioned problems, the main focus in this paper is the problem of novelty or outlier detection associated with fault detection. A simple iterative algorithm based on Titterington's minimum volume ellipsoid method is proposed for practical implementation. The KIMEE method demonstrates very good performance on a set of real-life and simulated datasets compared with support vector machine methods
Robust Sparse Canonical Correlation Analysis
Canonical correlation analysis (CCA) is a multivariate statistical method
which describes the associations between two sets of variables. The objective
is to find linear combinations of the variables in each data set having maximal
correlation. This paper discusses a method for Robust Sparse CCA. Sparse
estimation produces canonical vectors with some of their elements estimated as
exactly zero. As such, their interpretability is improved. We also robustify
the method such that it can cope with outliers in the data. To estimate the
canonical vectors, we convert the CCA problem into an alternating regression
framework, and use the sparse Least Trimmed Squares estimator. We illustrate
the good performance of the Robust Sparse CCA method in several simulation
studies and two real data examples
Recommended from our members
Efficient smile detection by Extreme Learning Machine
Smile detection is a specialized task in facial expression analysis with applications such as photo selection, user experience analysis, and patient monitoring. As one of the most important and informative expressions, smile conveys the underlying emotion status such as joy, happiness, and satisfaction. In this paper, an efficient smile detection approach is proposed based on Extreme Learning Machine (ELM). The faces are first detected and a holistic flow-based face registration is applied which does not need any manual labeling or key point detection. Then ELM is used to train the classifier. The proposed smile detector is tested with different feature descriptors on publicly available databases including real-world face images. The comparisons against benchmark classifiers including Support Vector Machine (SVM) and Linear Discriminant Analysis (LDA) suggest that the proposed ELM based smile detector in general performs better and is very efficient. Compared to state-of-the-art smile detector, the proposed method achieves competitive results without preprocessing and manual registration
- …