9,840 research outputs found
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
Provable Self-Representation Based Outlier Detection in a Union of Subspaces
Many computer vision tasks involve processing large amounts of data
contaminated by outliers, which need to be detected and rejected. While outlier
detection methods based on robust statistics have existed for decades, only
recently have methods based on sparse and low-rank representation been
developed along with guarantees of correct outlier detection when the inliers
lie in one or more low-dimensional subspaces. This paper proposes a new outlier
detection method that combines tools from sparse representation with random
walks on a graph. By exploiting the property that data points can be expressed
as sparse linear combinations of each other, we obtain an asymmetric affinity
matrix among data points, which we use to construct a weighted directed graph.
By defining a suitable Markov Chain from this graph, we establish a connection
between inliers/outliers and essential/inessential states of the Markov chain,
which allows us to detect outliers by using random walks. We provide a
theoretical analysis that justifies the correctness of our method under
geometric and connectivity assumptions. Experimental results on image databases
demonstrate its superiority with respect to state-of-the-art sparse and
low-rank outlier detection methods.Comment: 16 pages. CVPR 2017 spotlight oral presentatio
- …