52,822 research outputs found
Euclidean Distances, soft and spectral Clustering on Weighted Graphs
We define a class of Euclidean distances on weighted graphs, enabling to
perform thermodynamic soft graph clustering. The class can be constructed form
the "raw coordinates" encountered in spectral clustering, and can be extended
by means of higher-dimensional embeddings (Schoenberg transformations).
Geographical flow data, properly conditioned, illustrate the procedure as well
as visualization aspects.Comment: accepted for presentation (and further publication) at the ECML PKDD
2010 conferenc
Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More
Algorithms for node clustering typically focus on finding homophilous
structure in graphs. That is, they find sets of similar nodes with many edges
within, rather than across, the clusters. However, graphs often also exhibit
heterophilous structure, as exemplified by (nearly) bipartite and tripartite
graphs, where most edges occur across the clusters. Grappling with such
structure is typically left to the task of graph simplification. We present a
probabilistic model based on non-negative matrix factorization which unifies
clustering and simplification, and provides a framework for modeling arbitrary
graph structure. Our model is based on factorizing the process of taking a
random walk on the graph. It permits an unconstrained parametrization, allowing
for optimization via simple gradient descent. By relaxing the hard clustering
to a soft clustering, our algorithm relaxes potentially hard clustering
problems to a tractable ones. We illustrate our algorithm's capabilities on a
synthetic graph, as well as simple unsupervised learning tasks involving
bipartite and tripartite clustering of orthographic and phonological data
Graph ambiguity
In this paper, we propose a rigorous way to define the concept of ambiguity in the domain of graphs. In past studies, the classical definition of ambiguity has been derived starting from fuzzy set and fuzzy information theories. Our aim is to show that also in the domain of the graphs it is possible to derive a formulation able to capture the same semantic and mathematical concept. To strengthen the theoretical results, we discuss the application of the graph ambiguity concept to the graph classification setting, conceiving a new kind of inexact graph matching procedure. The results prove that the graph ambiguity concept is a characterizing and discriminative property of graphs. (C) 2013 Elsevier B.V. All rights reserved
The most persistent soft-clique in a set of sampled graphs
When searching for characteristic subpatterns in potentially noisy graph data, it appears self-evident that having multiple observations would be better than having just one. However, it turns out that the inconsistencies introduced when different graph instances have different edge sets pose a serious challenge. In this work we address this challenge for the problem of finding maximum weighted cliques. We introduce the concept of most persistent soft-clique. This is subset of vertices, that 1) is almost fully or at least densely connected, 2) occurs in all or almost all graph instances, and 3) has the maximum weight. We present a measure of clique-ness, that essentially counts the number of edge missing to make a subset of vertices into a clique. With this measure, we show that the problem of finding the most persistent soft-clique problem can be cast either as: a) a max-min two person game optimization problem, or b) a min-min soft margin optimization problem. Both formulations lead to the same solution when using a partial Lagrangian method to solve the optimization problems. By experiments on synthetic data and on real social network data, we show that the proposed method is able to reliably find soft cliques in graph data, even if that is distorted by random noise or unreliable observations
- …