Important data mining problems such as nearest-neighbor search and clustering
admit theoretical guarantees when restricted to objects embedded in a metric
space. Graphs are ubiquitous, and clustering and classification over graphs
arise in diverse areas, including, e.g., image processing and social networks.
Unfortunately, popular distance scores used in these applications, that scale
over large graphs, are not metrics and thus come with no guarantees. Classic
graph distances such as, e.g., the chemical and the CKS distance are arguably
natural and intuitive, and are indeed also metrics, but they are intractable:
as such, their computation does not scale to large graphs. We define a broad
family of graph distances, that includes both the chemical and the CKS
distance, and prove that these are all metrics. Crucially, we show that our
family includes metrics that are tractable. Moreover, we extend these distances
by incorporating auxiliary node attributes, which is important in practice,
while maintaining both the metric property and tractability.Comment: Extended version of paper appearing in SDM 201