175,400 research outputs found
Outlier Edge Detection Using Random Graph Generation Models and Applications
Outliers are samples that are generated by different mechanisms from other
normal data samples. Graphs, in particular social network graphs, may contain
nodes and edges that are made by scammers, malicious programs or mistakenly by
normal users. Detecting outlier nodes and edges is important for data mining
and graph analytics. However, previous research in the field has merely focused
on detecting outlier nodes. In this article, we study the properties of edges
and propose outlier edge detection algorithms using two random graph generation
models. We found that the edge-ego-network, which can be defined as the induced
graph that contains two end nodes of an edge, their neighboring nodes and the
edges that link these nodes, contains critical information to detect outlier
edges. We evaluated the proposed algorithms by injecting outlier edges into
some real-world graph data. Experiment results show that the proposed
algorithms can effectively detect outlier edges. In particular, the algorithm
based on the Preferential Attachment Random Graph Generation model consistently
gives good performance regardless of the test graph data. Further more, the
proposed algorithms are not limited in the area of outlier edge detection. We
demonstrate three different applications that benefit from the proposed
algorithms: 1) a preprocessing tool that improves the performance of graph
clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel
noisy data clustering algorithm. These applications show the great potential of
the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape
Graph measures and network robustness
Network robustness research aims at finding a measure to quantify network
robustness. Once such a measure has been established, we will be able to
compare networks, to improve existing networks and to design new networks that
are able to continue to perform well when it is subject to failures or attacks.
In this paper we survey a large amount of robustness measures on simple,
undirected and unweighted graphs, in order to offer a tool for network
administrators to evaluate and improve the robustness of their network. The
measures discussed in this paper are based on the concepts of connectivity
(including reliability polynomials), distance, betweenness and clustering. Some
other measures are notions from spectral graph theory, more precisely, they are
functions of the Laplacian eigenvalues. In addition to surveying these graph
measures, the paper also contains a discussion of their functionality as a
measure for topological network robustness
An Empirical Study on Budget-Aware Online Kernel Algorithms for Streams of Graphs
Kernel methods are considered an effective technique for on-line learning.
Many approaches have been developed for compactly representing the dual
solution of a kernel method when the problem imposes memory constraints.
However, in literature no work is specifically tailored to streams of graphs.
Motivated by the fact that the size of the feature space representation of many
state-of-the-art graph kernels is relatively small and thus it is explicitly
computable, we study whether executing kernel algorithms in the feature space
can be more effective than the classical dual approach. We study three
different algorithms and various strategies for managing the budget. Efficiency
and efficacy of the proposed approaches are experimentally assessed on
relatively large graph streams exhibiting concept drift. It turns out that,
when strict memory budget constraints have to be enforced, working in feature
space, given the current state of the art on graph kernels, is more than a
viable alternative to dual approaches, both in terms of speed and
classification performance.Comment: Author's version of the manuscript, to appear in Neurocomputing
(ELSEVIER
TopCom: Index for Shortest Distance Query in Directed Graph
Finding shortest distance between two vertices in a graph is an important
problem due to its numerous applications in diverse domains, including
geo-spatial databases, social network analysis, and information retrieval.
Classical algorithms (such as, Dijkstra) solve this problem in polynomial time,
but these algorithms cannot provide real-time response for a large number of
bursty queries on a large graph. So, indexing based solutions that pre-process
the graph for efficiently answering (exactly or approximately) a large number
of distance queries in real-time is becoming increasingly popular. Existing
solutions have varying performance in terms of index size, index building time,
query time, and accuracy. In this work, we propose T OP C OM , a novel
indexing-based solution for exactly answering distance queries. Our experiments
with two of the existing state-of-the-art methods (IS-Label and TreeMap) show
the superiority of T OP C OM over these two methods considering scalability and
query time. Besides, indexing of T OP C OM exploits the DAG (directed acyclic
graph) structure in the graph, which makes it significantly faster than the
existing methods if the SCCs (strongly connected component) of the input graph
are relatively small
- …