5,174 research outputs found
Provable Self-Representation Based Outlier Detection in a Union of Subspaces
Many computer vision tasks involve processing large amounts of data
contaminated by outliers, which need to be detected and rejected. While outlier
detection methods based on robust statistics have existed for decades, only
recently have methods based on sparse and low-rank representation been
developed along with guarantees of correct outlier detection when the inliers
lie in one or more low-dimensional subspaces. This paper proposes a new outlier
detection method that combines tools from sparse representation with random
walks on a graph. By exploiting the property that data points can be expressed
as sparse linear combinations of each other, we obtain an asymmetric affinity
matrix among data points, which we use to construct a weighted directed graph.
By defining a suitable Markov Chain from this graph, we establish a connection
between inliers/outliers and essential/inessential states of the Markov chain,
which allows us to detect outliers by using random walks. We provide a
theoretical analysis that justifies the correctness of our method under
geometric and connectivity assumptions. Experimental results on image databases
demonstrate its superiority with respect to state-of-the-art sparse and
low-rank outlier detection methods.Comment: 16 pages. CVPR 2017 spotlight oral presentatio
Outlier Edge Detection Using Random Graph Generation Models and Applications
Outliers are samples that are generated by different mechanisms from other
normal data samples. Graphs, in particular social network graphs, may contain
nodes and edges that are made by scammers, malicious programs or mistakenly by
normal users. Detecting outlier nodes and edges is important for data mining
and graph analytics. However, previous research in the field has merely focused
on detecting outlier nodes. In this article, we study the properties of edges
and propose outlier edge detection algorithms using two random graph generation
models. We found that the edge-ego-network, which can be defined as the induced
graph that contains two end nodes of an edge, their neighboring nodes and the
edges that link these nodes, contains critical information to detect outlier
edges. We evaluated the proposed algorithms by injecting outlier edges into
some real-world graph data. Experiment results show that the proposed
algorithms can effectively detect outlier edges. In particular, the algorithm
based on the Preferential Attachment Random Graph Generation model consistently
gives good performance regardless of the test graph data. Further more, the
proposed algorithms are not limited in the area of outlier edge detection. We
demonstrate three different applications that benefit from the proposed
algorithms: 1) a preprocessing tool that improves the performance of graph
clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel
noisy data clustering algorithm. These applications show the great potential of
the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape
STWalk: Learning Trajectory Representations in Temporal Graphs
Analyzing the temporal behavior of nodes in time-varying graphs is useful for
many applications such as targeted advertising, community evolution and outlier
detection. In this paper, we present a novel approach, STWalk, for learning
trajectory representations of nodes in temporal graphs. The proposed framework
makes use of structural properties of graphs at current and previous time-steps
to learn effective node trajectory representations. STWalk performs random
walks on a graph at a given time step (called space-walk) as well as on graphs
from past time-steps (called time-walk) to capture the spatio-temporal behavior
of nodes. We propose two variants of STWalk to learn trajectory
representations. In one algorithm, we perform space-walk and time-walk as part
of a single step. In the other variant, we perform space-walk and time-walk
separately and combine the learned representations to get the final trajectory
embedding. Extensive experiments on three real-world temporal graph datasets
validate the effectiveness of the learned representations when compared to
three baseline methods. We also show the goodness of the learned trajectory
embeddings for change point detection, as well as demonstrate that arithmetic
operations on these trajectory representations yield interesting and
interpretable results.Comment: 10 pages, 5 figures, 2 table
Semi-supervised Embedding in Attributed Networks with Outliers
In this paper, we propose a novel framework, called Semi-supervised Embedding
in Attributed Networks with Outliers (SEANO), to learn a low-dimensional vector
representation that systematically captures the topological proximity,
attribute affinity and label similarity of vertices in a partially labeled
attributed network (PLAN). Our method is designed to work in both transductive
and inductive settings while explicitly alleviating noise effects from
outliers. Experimental results on various datasets drawn from the web, text and
image domains demonstrate the advantages of SEANO over state-of-the-art methods
in semi-supervised classification under transductive as well as inductive
settings. We also show that a subset of parameters in SEANO is interpretable as
outlier score and can significantly outperform baseline methods when applied
for detecting network outliers. Finally, we present the use of SEANO in a
challenging real-world setting -- flood mapping of satellite images and show
that it is able to outperform modern remote sensing algorithms for this task.Comment: in Proceedings of SIAM International Conference on Data Mining
(SDM'18
The impact of outliers on transitory and permanent components in macroeconomic time series
In this paper we investigate the effect of the outliers on the decomposition of Nelson-Plosser macroeconomic data set into permanent and transitory components from structural time series models. We show that the outliers can disturb the unobserved-components decomposition, especially the variance of trend and cycle innovations, sometimes dramatically.
- …