11 research outputs found
A Family of Tractable Graph Distances
Important data mining problems such as nearest-neighbor search and clustering
admit theoretical guarantees when restricted to objects embedded in a metric
space. Graphs are ubiquitous, and clustering and classification over graphs
arise in diverse areas, including, e.g., image processing and social networks.
Unfortunately, popular distance scores used in these applications, that scale
over large graphs, are not metrics and thus come with no guarantees. Classic
graph distances such as, e.g., the chemical and the CKS distance are arguably
natural and intuitive, and are indeed also metrics, but they are intractable:
as such, their computation does not scale to large graphs. We define a broad
family of graph distances, that includes both the chemical and the CKS
distance, and prove that these are all metrics. Crucially, we show that our
family includes metrics that are tractable. Moreover, we extend these distances
by incorporating auxiliary node attributes, which is important in practice,
while maintaining both the metric property and tractability.Comment: Extended version of paper appearing in SDM 201
NetLSD: Hearing the Shape of a Graph
Comparison among graphs is ubiquitous in graph analytics. However, it is a
hard task in terms of the expressiveness of the employed similarity measure and
the efficiency of its computation. Ideally, graph comparison should be
invariant to the order of nodes and the sizes of compared graphs, adaptive to
the scale of graph patterns, and scalable. Unfortunately, these properties have
not been addressed together. Graph comparisons still rely on direct approaches,
graph kernels, or representation-based methods, which are all inefficient and
impractical for large graph collections.
In this paper, we propose the Network Laplacian Spectral Descriptor (NetLSD):
the first, to our knowledge, permutation- and size-invariant, scale-adaptive,
and efficiently computable graph representation method that allows for
straightforward comparisons of large graphs. NetLSD extracts a compact
signature that inherits the formal properties of the Laplacian spectrum,
specifically its heat or wave kernel; thus, it hears the shape of a graph. Our
evaluation on a variety of real-world graphs demonstrates that it outperforms
previous works in both expressiveness and efficiency.Comment: KDD '18: The 24th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, August 19--23, 2018, London, United Kingdo
Computing Graph Descriptors on Edge Streams
Feature extraction is an essential task in graph analytics. These feature
vectors, called graph descriptors, are used in downstream vector-space-based
graph analysis models. This idea has proved fruitful in the past, with
spectral-based graph descriptors providing state-of-the-art classification
accuracy. However, known algorithms to compute meaningful descriptors do not
scale to large graphs since: (1) they require storing the entire graph in
memory, and (2) the end-user has no control over the algorithm's runtime. In
this paper, we present streaming algorithms to approximately compute three
different graph descriptors capturing the essential structure of graphs.
Operating on edge streams allows us to avoid storing the entire graph in
memory, and controlling the sample size enables us to keep the runtime of our
algorithms within desired bounds. We demonstrate the efficacy of the proposed
descriptors by analyzing the approximation error and classification accuracy.
Our scalable algorithms compute descriptors of graphs with millions of edges
within minutes. Moreover, these descriptors yield predictive accuracy
comparable to the state-of-the-art methods but can be computed using only 25%
as much memory.Comment: Extension of work accepted to PAKDD 202