12,615 research outputs found
Detecting Outliers in Data with Correlated Measures
Advances in sensor technology have enabled the collection of large-scale
datasets. Such datasets can be extremely noisy and often contain a significant
amount of outliers that result from sensor malfunction or human operation
faults. In order to utilize such data for real-world applications, it is
critical to detect outliers so that models built from these datasets will not
be skewed by outliers.
In this paper, we propose a new outlier detection method that utilizes the
correlations in the data (e.g., taxi trip distance vs. trip time). Different
from existing outlier detection methods, we build a robust regression model
that explicitly models the outliers and detects outliers simultaneously with
the model fitting.
We validate our approach on real-world datasets against methods specifically
designed for each dataset as well as the state of the art outlier detectors.
Our outlier detection method achieves better performances, demonstrating the
robustness and generality of our method. Last, we report interesting case
studies on some outliers that result from atypical events.Comment: 10 page
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Robust Linear Spectral Unmixing using Anomaly Detection
This paper presents a Bayesian algorithm for linear spectral unmixing of
hyperspectral images that accounts for anomalies present in the data. The model
proposed assumes that the pixel reflectances are linear mixtures of unknown
endmembers, corrupted by an additional nonlinear term modelling anomalies and
additive Gaussian noise. A Markov random field is used for anomaly detection
based on the spatial and spectral structures of the anomalies. This allows
outliers to be identified in particular regions and wavelengths of the data
cube. A Bayesian algorithm is proposed to estimate the parameters involved in
the model yielding a joint linear unmixing and anomaly detection algorithm.
Simulations conducted with synthetic and real hyperspectral images demonstrate
the accuracy of the proposed unmixing and outlier detection strategy for the
analysis of hyperspectral images
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Discriminative Density-ratio Estimation
The covariate shift is a challenging problem in supervised learning that
results from the discrepancy between the training and test distributions. An
effective approach which recently drew a considerable attention in the research
community is to reweight the training samples to minimize that discrepancy. In
specific, many methods are based on developing Density-ratio (DR) estimation
techniques that apply to both regression and classification problems. Although
these methods work well for regression problems, their performance on
classification problems is not satisfactory. This is due to a key observation
that these methods focus on matching the sample marginal distributions without
paying attention to preserving the separation between classes in the reweighted
space. In this paper, we propose a novel method for Discriminative
Density-ratio (DDR) estimation that addresses the aforementioned problem and
aims at estimating the density-ratio of joint distributions in a class-wise
manner. The proposed algorithm is an iterative procedure that alternates
between estimating the class information for the test data and estimating new
density ratio for each class. To incorporate the estimated class information of
the test data, a soft matching technique is proposed. In addition, we employ an
effective criterion which adopts mutual information as an indicator to stop the
iterative procedure while resulting in a decision boundary that lies in a
sparse region. Experiments on synthetic and benchmark datasets demonstrate the
superiority of the proposed method in terms of both accuracy and robustness
Exploring Outliers in Crowdsourced Ranking for QoE
Outlier detection is a crucial part of robust evaluation for crowdsourceable
assessment of Quality of Experience (QoE) and has attracted much attention in
recent years. In this paper, we propose some simple and fast algorithms for
outlier detection and robust QoE evaluation based on the nonconvex optimization
principle. Several iterative procedures are designed with or without knowing
the number of outliers in samples. Theoretical analysis is given to show that
such procedures can reach statistically good estimates under mild conditions.
Finally, experimental results with simulated and real-world crowdsourcing
datasets show that the proposed algorithms could produce similar performance to
Huber-LASSO approach in robust ranking, yet with nearly 8 or 90 times speed-up,
without or with a prior knowledge on the sparsity size of outliers,
respectively. Therefore the proposed methodology provides us a set of helpful
tools for robust QoE evaluation with crowdsourcing data.Comment: accepted by ACM Multimedia 2017 (Oral presentation). arXiv admin
note: text overlap with arXiv:1407.763
- …