4,330 research outputs found
Euclidean Distances, soft and spectral Clustering on Weighted Graphs
We define a class of Euclidean distances on weighted graphs, enabling to
perform thermodynamic soft graph clustering. The class can be constructed form
the "raw coordinates" encountered in spectral clustering, and can be extended
by means of higher-dimensional embeddings (Schoenberg transformations).
Geographical flow data, properly conditioned, illustrate the procedure as well
as visualization aspects.Comment: accepted for presentation (and further publication) at the ECML PKDD
2010 conferenc
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
A survey of kernel and spectral methods for clustering
Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm. (C) 2007 Pattem Recognition Society. Published by Elsevier Ltd. All rights reserved
Developments in the theory of randomized shortest paths with a comparison of graph node distances
There have lately been several suggestions for parametrized distances on a
graph that generalize the shortest path distance and the commute time or
resistance distance. The need for developing such distances has risen from the
observation that the above-mentioned common distances in many situations fail
to take into account the global structure of the graph. In this article, we
develop the theory of one family of graph node distances, known as the
randomized shortest path dissimilarity, which has its foundation in statistical
physics. We show that the randomized shortest path dissimilarity can be easily
computed in closed form for all pairs of nodes of a graph. Moreover, we come up
with a new definition of a distance measure that we call the free energy
distance. The free energy distance can be seen as an upgrade of the randomized
shortest path dissimilarity as it defines a metric, in addition to which it
satisfies the graph-geodetic property. The derivation and computation of the
free energy distance are also straightforward. We then make a comparison
between a set of generalized distances that interpolate between the shortest
path distance and the commute time, or resistance distance. This comparison
focuses on the applicability of the distances in graph node clustering and
classification. The comparison, in general, shows that the parametrized
distances perform well in the tasks. In particular, we see that the results
obtained with the free energy distance are among the best in all the
experiments.Comment: 30 pages, 4 figures, 3 table
Computing communities in large networks using random walks
Dense subgraphs of sparse graphs (communities), which appear in most
real-world complex networks, play an important role in many contexts. Computing
them however is generally expensive. We propose here a measure of similarities
between vertices based on random walks which has several important advantages:
it captures well the community structure in a network, it can be computed
efficiently, it works at various scales, and it can be used in an agglomerative
algorithm to compute efficiently the community structure of a network. We
propose such an algorithm which runs in time O(mn^2) and space O(n^2) in the
worst case, and in time O(n^2log n) and space O(n^2) in most real-world cases
(n and m are respectively the number of vertices and edges in the input graph).
Experimental evaluation shows that our algorithm surpasses previously proposed
ones concerning the quality of the obtained community structures and that it
stands among the best ones concerning the running time. This is very promising
because our algorithm can be improved in several ways, which we sketch at the
end of the paper.Comment: 15 pages, 4 figure
- ā¦