90,765 research outputs found
Euclidean Distances, soft and spectral Clustering on Weighted Graphs
We define a class of Euclidean distances on weighted graphs, enabling to
perform thermodynamic soft graph clustering. The class can be constructed form
the "raw coordinates" encountered in spectral clustering, and can be extended
by means of higher-dimensional embeddings (Schoenberg transformations).
Geographical flow data, properly conditioned, illustrate the procedure as well
as visualization aspects.Comment: accepted for presentation (and further publication) at the ECML PKDD
2010 conferenc
Developments in the theory of randomized shortest paths with a comparison of graph node distances
There have lately been several suggestions for parametrized distances on a
graph that generalize the shortest path distance and the commute time or
resistance distance. The need for developing such distances has risen from the
observation that the above-mentioned common distances in many situations fail
to take into account the global structure of the graph. In this article, we
develop the theory of one family of graph node distances, known as the
randomized shortest path dissimilarity, which has its foundation in statistical
physics. We show that the randomized shortest path dissimilarity can be easily
computed in closed form for all pairs of nodes of a graph. Moreover, we come up
with a new definition of a distance measure that we call the free energy
distance. The free energy distance can be seen as an upgrade of the randomized
shortest path dissimilarity as it defines a metric, in addition to which it
satisfies the graph-geodetic property. The derivation and computation of the
free energy distance are also straightforward. We then make a comparison
between a set of generalized distances that interpolate between the shortest
path distance and the commute time, or resistance distance. This comparison
focuses on the applicability of the distances in graph node clustering and
classification. The comparison, in general, shows that the parametrized
distances perform well in the tasks. In particular, we see that the results
obtained with the free energy distance are among the best in all the
experiments.Comment: 30 pages, 4 figures, 3 table
Nonparametric Feature Extraction from Dendrograms
We propose feature extraction from dendrograms in a nonparametric way. The
Minimax distance measures correspond to building a dendrogram with single
linkage criterion, with defining specific forms of a level function and a
distance function over that. Therefore, we extend this method to arbitrary
dendrograms. We develop a generalized framework wherein different distance
measures can be inferred from different types of dendrograms, level functions
and distance functions. Via an appropriate embedding, we compute a vector-based
representation of the inferred distances, in order to enable many numerical
machine learning algorithms to employ such distances. Then, to address the
model selection problem, we study the aggregation of different dendrogram-based
distances respectively in solution space and in representation space in the
spirit of deep representations. In the first approach, for example for the
clustering problem, we build a graph with positive and negative edge weights
according to the consistency of the clustering labels of different objects
among different solutions, in the context of ensemble methods. Then, we use an
efficient variant of correlation clustering to produce the final clusters. In
the second approach, we investigate the sequential combination of different
distances and features sequentially in the spirit of multi-layered
architectures to obtain the final features. Finally, we demonstrate the
effectiveness of our approach via several numerical studies
- …