23,325 research outputs found

    Developments in the theory of randomized shortest paths with a comparison of graph node distances

    Get PDF
    There have lately been several suggestions for parametrized distances on a graph that generalize the shortest path distance and the commute time or resistance distance. The need for developing such distances has risen from the observation that the above-mentioned common distances in many situations fail to take into account the global structure of the graph. In this article, we develop the theory of one family of graph node distances, known as the randomized shortest path dissimilarity, which has its foundation in statistical physics. We show that the randomized shortest path dissimilarity can be easily computed in closed form for all pairs of nodes of a graph. Moreover, we come up with a new definition of a distance measure that we call the free energy distance. The free energy distance can be seen as an upgrade of the randomized shortest path dissimilarity as it defines a metric, in addition to which it satisfies the graph-geodetic property. The derivation and computation of the free energy distance are also straightforward. We then make a comparison between a set of generalized distances that interpolate between the shortest path distance and the commute time, or resistance distance. This comparison focuses on the applicability of the distances in graph node clustering and classification. The comparison, in general, shows that the parametrized distances perform well in the tasks. In particular, we see that the results obtained with the free energy distance are among the best in all the experiments.Comment: 30 pages, 4 figures, 3 table

    Do logarithmic proximity measures outperform plain ones in graph clustering?

    Full text link
    We consider a number of graph kernels and proximity measures including commute time kernel, regularized Laplacian kernel, heat kernel, exponential diffusion kernel (also called "communicability"), etc., and the corresponding distances as applied to clustering nodes in random graphs and several well-known datasets. The model of generating random graphs involves edge probabilities for the pairs of nodes that belong to the same class or different predefined classes of nodes. It turns out that in most cases, logarithmic measures (i.e., measures resulting after taking logarithm of the proximities) perform better while distinguishing underlying classes than the "plain" measures. A comparison in terms of reject curves of inter-class and intra-class distances confirms this conclusion. A similar conclusion can be made for several well-known datasets. A possible origin of this effect is that most kernels have a multiplicative nature, while the nature of distances used in cluster algorithms is an additive one (cf. the triangle inequality). The logarithmic transformation is a tool to transform the first nature to the second one. Moreover, some distances corresponding to the logarithmic measures possess a meaningful cutpoint additivity property. In our experiments, the leader is usually the logarithmic Communicability measure. However, we indicate some more complicated cases in which other measures, typically, Communicability and plain Walk, can be the winners.Comment: 11 pages, 5 tables, 9 figures. Accepted for publication in the Proceedings of 6th International Conference on Network Analysis, May 26-28, 2016, Nizhny Novgorod, Russi

    Nonparametric Feature Extraction from Dendrograms

    Full text link
    We propose feature extraction from dendrograms in a nonparametric way. The Minimax distance measures correspond to building a dendrogram with single linkage criterion, with defining specific forms of a level function and a distance function over that. Therefore, we extend this method to arbitrary dendrograms. We develop a generalized framework wherein different distance measures can be inferred from different types of dendrograms, level functions and distance functions. Via an appropriate embedding, we compute a vector-based representation of the inferred distances, in order to enable many numerical machine learning algorithms to employ such distances. Then, to address the model selection problem, we study the aggregation of different dendrogram-based distances respectively in solution space and in representation space in the spirit of deep representations. In the first approach, for example for the clustering problem, we build a graph with positive and negative edge weights according to the consistency of the clustering labels of different objects among different solutions, in the context of ensemble methods. Then, we use an efficient variant of correlation clustering to produce the final clusters. In the second approach, we investigate the sequential combination of different distances and features sequentially in the spirit of multi-layered architectures to obtain the final features. Finally, we demonstrate the effectiveness of our approach via several numerical studies

    NetLSD: Hearing the Shape of a Graph

    Full text link
    Comparison among graphs is ubiquitous in graph analytics. However, it is a hard task in terms of the expressiveness of the employed similarity measure and the efficiency of its computation. Ideally, graph comparison should be invariant to the order of nodes and the sizes of compared graphs, adaptive to the scale of graph patterns, and scalable. Unfortunately, these properties have not been addressed together. Graph comparisons still rely on direct approaches, graph kernels, or representation-based methods, which are all inefficient and impractical for large graph collections. In this paper, we propose the Network Laplacian Spectral Descriptor (NetLSD): the first, to our knowledge, permutation- and size-invariant, scale-adaptive, and efficiently computable graph representation method that allows for straightforward comparisons of large graphs. NetLSD extracts a compact signature that inherits the formal properties of the Laplacian spectrum, specifically its heat or wave kernel; thus, it hears the shape of a graph. Our evaluation on a variety of real-world graphs demonstrates that it outperforms previous works in both expressiveness and efficiency.Comment: KDD '18: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, August 19--23, 2018, London, United Kingdo
    • …
    corecore