73,999 research outputs found
On Spectral Graph Embedding: A Non-Backtracking Perspective and Graph Approximation
Graph embedding has been proven to be efficient and effective in facilitating
graph analysis. In this paper, we present a novel spectral framework called
NOn-Backtracking Embedding (NOBE), which offers a new perspective that
organizes graph data at a deep level by tracking the flow traversing on the
edges with backtracking prohibited. Further, by analyzing the non-backtracking
process, a technique called graph approximation is devised, which provides a
channel to transform the spectral decomposition on an edge-to-edge matrix to
that on a node-to-node matrix. Theoretical guarantees are provided by bounding
the difference between the corresponding eigenvalues of the original graph and
its graph approximation. Extensive experiments conducted on various real-world
networks demonstrate the efficacy of our methods on both macroscopic and
microscopic levels, including clustering and structural hole spanner detection.Comment: SDM 2018 (Full version including all proofs
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
In recent years, ideas from statistics and scientific computing have begun to
interact in increasingly sophisticated and fruitful ways with ideas from
computer science and the theory of algorithms to aid in the development of
improved worst-case algorithms that are useful for large-scale scientific and
Internet data analysis problems. In this chapter, I will describe two recent
examples---one having to do with selecting good columns or features from a (DNA
Single Nucleotide Polymorphism) data matrix, and the other having to do with
selecting good clusters or communities from a data graph (representing a social
or information network)---that drew on ideas from both areas and that may serve
as a model for exploiting complementary algorithmic and statistical
perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors,
"Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
NetLSD: Hearing the Shape of a Graph
Comparison among graphs is ubiquitous in graph analytics. However, it is a
hard task in terms of the expressiveness of the employed similarity measure and
the efficiency of its computation. Ideally, graph comparison should be
invariant to the order of nodes and the sizes of compared graphs, adaptive to
the scale of graph patterns, and scalable. Unfortunately, these properties have
not been addressed together. Graph comparisons still rely on direct approaches,
graph kernels, or representation-based methods, which are all inefficient and
impractical for large graph collections.
In this paper, we propose the Network Laplacian Spectral Descriptor (NetLSD):
the first, to our knowledge, permutation- and size-invariant, scale-adaptive,
and efficiently computable graph representation method that allows for
straightforward comparisons of large graphs. NetLSD extracts a compact
signature that inherits the formal properties of the Laplacian spectrum,
specifically its heat or wave kernel; thus, it hears the shape of a graph. Our
evaluation on a variety of real-world graphs demonstrates that it outperforms
previous works in both expressiveness and efficiency.Comment: KDD '18: The 24th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, August 19--23, 2018, London, United Kingdo
Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch
Graph-based Semi-supervised learning (SSL) algorithms have been successfully
used in a large number of applications. These methods classify initially
unlabeled nodes by propagating label information over the structure of graph
starting from seed nodes. Graph-based SSL algorithms usually scale linearly
with the number of distinct labels (m), and require O(m) space on each node.
Unfortunately, there exist many applications of practical significance with
very large m over large graphs, demanding better space and time complexity. In
this paper, we propose MAD-SKETCH, a novel graph-based SSL algorithm which
compactly stores label distribution on each node using Count-min Sketch, a
randomized data structure. We present theoretical analysis showing that under
mild conditions, MAD-SKETCH can reduce space complexity at each node from O(m)
to O(log m), and achieve similar savings in time complexity as well. We support
our analysis through experiments on multiple real world datasets. We observe
that MAD-SKETCH achieves similar performance as existing state-of-the-art
graph- based SSL algorithms, while requiring smaller memory footprint and at
the same time achieving up to 10x speedup. We find that MAD-SKETCH is able to
scale to datasets with one million labels, which is beyond the scope of
existing graph- based SSL algorithms.Comment: 9 page
- …