31,056 research outputs found
Detecting outlying subspaces for high-dimensional data: the new task, algorithms and performance
[Abstract]: In this paper, we identify a new task for studying the outlying degree (OD) of high-dimensional data, i.e. finding the subspaces (subsets of features)
in which the given points are outliers, which are called their outlying subspaces. Since the state-of-the-art outlier detection techniques fail to handle this
new problem, we propose a novel detection algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the outlying subspaces of
high-dimensional data efficiently. The intuitive idea of HighDOD is that we measure the OD of the point using the sum of distances between this point and its k nearest neighbors. Two heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive top–down, bottom–up and random search methods, and the existing
outlier detection methods cannot fulfill this new task effectively
Robust Recovery of Subspace Structures by Low-Rank Representation
In this work we address the subspace recovery problem. Given a set of data
samples (vectors) approximately drawn from a union of multiple subspaces, our
goal is to segment the samples into their respective subspaces and correct the
possible errors as well. To this end, we propose a novel method termed Low-Rank
Representation (LRR), which seeks the lowest-rank representation among all the
candidates that can represent the data samples as linear combinations of the
bases in a given dictionary. It is shown that LRR well solves the subspace
recovery problem: when the data is clean, we prove that LRR exactly captures
the true subspace structures; for the data contaminated by outliers, we prove
that under certain conditions LRR can exactly recover the row space of the
original data and detect the outlier as well; for the data corrupted by
arbitrary errors, LRR can also approximately recover the row space with
theoretical guarantees. Since the subspace membership is provably determined by
the row space, these further imply that LRR can perform robust subspace
segmentation and error correction, in an efficient way.Comment: IEEE Trans. Pattern Analysis and Machine Intelligenc
Outlier Detection from Network Data with Subnetwork Interpretation
Detecting a small number of outliers from a set of data observations is
always challenging. This problem is more difficult in the setting of multiple
network samples, where computing the anomalous degree of a network sample is
generally not sufficient. In fact, explaining why the network is exceptional,
expressed in the form of subnetwork, is also equally important. In this paper,
we develop a novel algorithm to address these two key problems. We treat each
network sample as a potential outlier and identify subnetworks that mostly
discriminate it from nearby regular samples. The algorithm is developed in the
framework of network regression combined with the constraints on both network
topology and L1-norm shrinkage to perform subnetwork discovery. Our method thus
goes beyond subspace/subgraph discovery and we show that it converges to a
global optimum. Evaluation on various real-world network datasets demonstrates
that our algorithm not only outperforms baselines in both network and high
dimensional setting, but also discovers highly relevant and interpretable local
subnetworks, further enhancing our understanding of anomalous networks
SURGE: Continuous Detection of Bursty Regions Over a Stream of Spatial Objects
With the proliferation of mobile devices and location-based services,
continuous generation of massive volume of streaming spatial objects (i.e.,
geo-tagged data) opens up new opportunities to address real-world problems by
analyzing them. In this paper, we present a novel continuous bursty region
detection problem that aims to continuously detect a bursty region of a given
size in a specified geographical area from a stream of spatial objects.
Specifically, a bursty region shows maximum spike in the number of spatial
objects in a given time window. The problem is useful in addressing several
real-world challenges such as surge pricing problem in online transportation
and disease outbreak detection. To solve the problem, we propose an exact
solution and two approximate solutions, and the approximation ratio is
in terms of the burst score, where is a parameter
to control the burst score. We further extend these solutions to support
detection of top- bursty regions. Extensive experiments with real-world data
are conducted to demonstrate the efficiency and effectiveness of our solutions
- …