7,843 research outputs found
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Exploring Outliers in Crowdsourced Ranking for QoE
Outlier detection is a crucial part of robust evaluation for crowdsourceable
assessment of Quality of Experience (QoE) and has attracted much attention in
recent years. In this paper, we propose some simple and fast algorithms for
outlier detection and robust QoE evaluation based on the nonconvex optimization
principle. Several iterative procedures are designed with or without knowing
the number of outliers in samples. Theoretical analysis is given to show that
such procedures can reach statistically good estimates under mild conditions.
Finally, experimental results with simulated and real-world crowdsourcing
datasets show that the proposed algorithms could produce similar performance to
Huber-LASSO approach in robust ranking, yet with nearly 8 or 90 times speed-up,
without or with a prior knowledge on the sparsity size of outliers,
respectively. Therefore the proposed methodology provides us a set of helpful
tools for robust QoE evaluation with crowdsourcing data.Comment: accepted by ACM Multimedia 2017 (Oral presentation). arXiv admin
note: text overlap with arXiv:1407.763
Contextual Outlier Interpretation
Outlier detection plays an essential role in many data-driven applications to
identify isolated instances that are different from the majority. While many
statistical learning and data mining techniques have been used for developing
more effective outlier detection algorithms, the interpretation of detected
outliers does not receive much attention. Interpretation is becoming
increasingly important to help people trust and evaluate the developed models
through providing intrinsic reasons why the certain outliers are chosen. It is
difficult, if not impossible, to simply apply feature selection for explaining
outliers due to the distinct characteristics of various detection models,
complicated structures of data in certain applications, and imbalanced
distribution of outliers and normal instances. In addition, the role of
contrastive contexts where outliers locate, as well as the relation between
outliers and contexts, are usually overlooked in interpretation. To tackle the
issues above, in this paper, we propose a novel Contextual Outlier
INterpretation (COIN) method to explain the abnormality of existing outliers
spotted by detectors. The interpretability for an outlier is achieved from
three aspects: outlierness score, attributes that contribute to the
abnormality, and contextual description of its neighborhoods. Experimental
results on various types of datasets demonstrate the flexibility and
effectiveness of the proposed framework compared with existing interpretation
approaches
- …