815 research outputs found
Spectral Embedding Norm: Looking Deep into the Spectrum of the Graph Laplacian
The extraction of clusters from a dataset which includes multiple clusters
and a significant background component is a non-trivial task of practical
importance. In image analysis this manifests for example in anomaly detection
and target detection. The traditional spectral clustering algorithm, which
relies on the leading eigenvectors to detect clusters, fails in such
cases. In this paper we propose the {\it spectral embedding norm} which sums
the squared values of the first normalized eigenvectors, where can be
significantly larger than . We prove that this quantity can be used to
separate clusters from the background in unbalanced settings, including extreme
cases such as outlier detection. The performance of the algorithm is not
sensitive to the choice of , and we demonstrate its application on synthetic
and real-world remote sensing and neuroimaging datasets
MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams
Given a stream of graph edges from a dynamic graph, how can we assign anomaly
scores to edges in an online manner, for the purpose of detecting unusual
behavior, using constant time and memory? Existing approaches aim to detect
individually surprising edges. In this work, we propose MIDAS, which focuses on
detecting microcluster anomalies, or suddenly arriving groups of suspiciously
similar edges, such as lockstep behavior, including denial of service attacks
in network traffic data. MIDAS has the following properties: (a) it detects
microcluster anomalies while providing theoretical guarantees about its false
positive probability; (b) it is online, thus processing each edge in constant
time and constant memory, and also processes the data 162-644 times faster than
state-of-the-art approaches; (c) it provides 42%-48% higher accuracy (in terms
of AUC) than state-of-the-art approaches.Comment: 8 pages, Accepted at AAAI Conference on Artificial Intelligence
(AAAI), 2020 [oral paper]; minor fixes, updated experiment
Outlier Detection from Network Data with Subnetwork Interpretation
Detecting a small number of outliers from a set of data observations is
always challenging. This problem is more difficult in the setting of multiple
network samples, where computing the anomalous degree of a network sample is
generally not sufficient. In fact, explaining why the network is exceptional,
expressed in the form of subnetwork, is also equally important. In this paper,
we develop a novel algorithm to address these two key problems. We treat each
network sample as a potential outlier and identify subnetworks that mostly
discriminate it from nearby regular samples. The algorithm is developed in the
framework of network regression combined with the constraints on both network
topology and L1-norm shrinkage to perform subnetwork discovery. Our method thus
goes beyond subspace/subgraph discovery and we show that it converges to a
global optimum. Evaluation on various real-world network datasets demonstrates
that our algorithm not only outperforms baselines in both network and high
dimensional setting, but also discovers highly relevant and interpretable local
subnetworks, further enhancing our understanding of anomalous networks
Sketch-Based Streaming Anomaly Detection in Dynamic Graphs
Given a stream of graph edges from a dynamic graph, how can we assign anomaly
scores to edges and subgraphs in an online manner, for the purpose of detecting
unusual behavior, using constant time and memory? For example, in intrusion
detection, existing work seeks to detect either anomalous edges or anomalous
subgraphs, but not both. In this paper, we first extend the count-min sketch
data structure to a higher-order sketch. This higher-order sketch has the
useful property of preserving the dense subgraph structure (dense subgraphs in
the input turn into dense submatrices in the data structure). We then propose
four online algorithms that utilize this enhanced data structure, which (a)
detect both edge and graph anomalies; (b) process each edge and graph in
constant memory and constant update time per newly arriving edge, and; (c)
outperform state-of-the-art baselines on four real-world datasets. Our method
is the first streaming approach that incorporates dense subgraph search to
detect graph anomalies in constant memory and time
- …