3,964 research outputs found
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
Non-Negative Local Sparse Coding for Subspace Clustering
Subspace sparse coding (SSC) algorithms have proven to be beneficial to
clustering problems. They provide an alternative data representation in which
the underlying structure of the clusters can be better captured. However, most
of the research in this area is mainly focused on enhancing the sparse coding
part of the problem. In contrast, we introduce a novel objective term in our
proposed SSC framework which focuses on the separability of data points in the
coding space. We also provide mathematical insights into how this
local-separability term improves the clustering result of the SSC framework.
Our proposed non-linear local SSC algorithm (NLSSC) also benefits from the
efficient choice of its sparsity terms and constraints. The NLSSC algorithm is
also formulated in the kernel-based framework (NLKSSC) which can represent the
nonlinear structure of data. In addition, we address the possibility of having
redundancies in sparse coding results and its negative effect on graph-based
clustering problems. We introduce the link-restore post-processing step to
improve the representation graph of non-negative SSC algorithms such as ours.
Empirical evaluations on well-known clustering benchmarks show that our
proposed NLSSC framework results in better clusterings compared to the
state-of-the-art baselines and demonstrate the effectiveness of the
link-restore post-processing in improving the clustering accuracy via
correcting the broken links of the representation graph.Comment: 15 pages, IDA 2018 conferenc
- …