15 research outputs found
Axioms for graph clustering quality functions
We investigate properties that intuitively ought to be satisfied by graph
clustering quality functions, that is, functions that assign a score to a
clustering of a graph. Graph clustering, also known as network community
detection, is often performed by optimizing such a function. Two axioms
tailored for graph clustering quality functions are introduced, and the four
axioms introduced in previous work on distance based clustering are
reformulated and generalized for the graph setting. We show that modularity, a
standard quality function for graph clustering, does not satisfy all of these
six properties. This motivates the derivation of a new family of quality
functions, adaptive scale modularity, which does satisfy the proposed axioms.
Adaptive scale modularity has two parameters, which give greater flexibility in
the kinds of clusterings that can be found. Standard graph clustering quality
functions, such as normalized cut and unnormalized cut, are obtained as special
cases of adaptive scale modularity.
In general, the results of our investigation indicate that the considered
axiomatic framework covers existing `good' quality functions for graph
clustering, and can be used to derive an interesting new family of quality
functions.Comment: 23 pages. Full text and sources available on:
http://www.cs.ru.nl/~T.vanLaarhoven/graph-clustering-axioms-2014
Incompatibility boundaries for properties of community partitions
We prove the incompatibility of certain desirable properties of community
partition quality functions. Our results generalize the impossibility result of
[Kleinberg 2003] by considering sets of weaker properties. In particular, we
use an alternative notion to solve the central issue of the consistency
property. (The latter means that modifying the graph in a way consistent with a
partition should not have counterintuitive effects). Our results clearly show
that community partition methods should not be expected to perfectly satisfy
all ideally desired properties.
We then proceed to show that this incompatibility no longer holds when
slightly relaxed versions of the properties are considered, and we provide in
fact examples of simple quality functions satisfying these relaxed properties.
An experimental study of these quality functions shows a behavior comparable to
established methods in some situations, but more debatable results in others.
This suggests that defining a notion of good partition in communities probably
requires imposing additional properties.Comment: 17 pages, 3 figure
Leiden algoritmasında kalite faktörünün etkisi
Leiden algorithm is a widely utilized algorithm to cluster network graphs. It divides the specified
network into smaller clusters. The clusters are relatively dense networks of vertices. In the process, the
networks are divided based on quality factors. In this study, we compare the result of the Leiden
algorithm with changing quality factors, namely Modularity and Constant Potts Model (CPM). For our
analysis, we used 3×3 knight graph. Our investigation is completed for resolutions from 0.1 to 4.0 for
Modularity and from 0.1 to 1.0 for CPM. The maximum quality scores are 0.9 and 0.59375 for
Modularity and CPM respectively. The continuous decrease in the quality was recorded for both cases
with respect to the increasing resolution. Both scoring factors are followed similar trends, but CPM has
a relatively rapid division of the specified graph.Leiden algoritması, çizgeleri kümelemek için yaygın olarak kullanılan bir algoritmadır ve belirtilen çizgeyi
daha küçük kümelere böler. Bu kümeler, nispeten yoğun düğüm çizgeleridir. Süreçte çizgeler kalite
faktörlerine göre kümelenir. Bu çalışmada Leiden algoritmasını Modülerlik ve Sabit Potts Modeli (CPM)
kalite faktörleri ile değişimini karşılaştırılmıştır. Analiz için 3×3 at çizgesi kullanıldı. İnceleme Modülerlik
için 0,1'den 4,0'a ve CPM için 0,1'den 1,0'a kadar olan çözünürlükler için tamamlandı. Maksimum kalite
puanları Modülerlik ve CPM için sırasıyla 0,9 ve 0,59375'tir. Kalitede artan çözünürlüğe göre her iki
durumda da sürekli düşüş kaydedildi. Her iki puanlama faktörü de benzer eğilimler izlendi, ancak CPM
nispeten konu edilen çizgeyi daha hızlı kümeledi
Incompatibility boundaries for properties of community partitions
We prove the incompatibility of certain desirable properties of community partition quality functions. Our results generalize the impossibility result of [Kleinberg 2003] by considering sets of weaker properties. In particular, we use an alternative notion to solve the central issue of the consistency property. (The latter means that modifying the graph in a way consistent with a partition should not have counterintuitive effects). Our results clearly show that community partition methods should not be expected to perfectly satisfy all ideally desired properties. We then proceed to show that this incompatibility no longer holds when slightly relaxed versions of the properties are considered, and we provide examples of simple quality functions satisfying these relaxed properties. An experimental study of these quality functions shows a behavior comparable to established methods in some situations, but more debatable results in others. This suggests that defining a notion of good partition in communities probably requires imposing additional properties
Systematic Analysis of Cluster Similarity Indices: How to Validate Validation Measures
Many cluster similarity indices are used to evaluate clustering algorithms,
and choosing the best one for a particular task remains an open problem. We
demonstrate that this problem is crucial: there are many disagreements among
the indices, these disagreements do affect which algorithms are preferred in
applications, and this can lead to degraded performance in real-world systems.
We propose a theoretical framework to tackle this problem: we develop a list of
desirable properties and conduct an extensive theoretical analysis to verify
which indices satisfy them. This allows for making an informed choice: given a
particular application, one can first select properties that are desirable for
the task and then identify indices satisfying these. Our work unifies and
considerably extends existing attempts at analyzing cluster similarity indices:
we introduce new properties, formalize existing ones, and mathematically prove
or disprove each property for an extensive list of validation indices. This
broader and more rigorous approach leads to recommendations that considerably
differ from how validation indices are currently being chosen by practitioners.
Some of the most popular indices are even shown to be dominated by previously
overlooked ones
Indices de qualité en clustering
National audienceL'absence de vérité de terrain, entre autres, fait que l'évaluation d'un clustering est un problème non trivial pour lequel il est nécessaire d'utiliser des indices de qualité adaptés au but recherché et aux données. L'exposé présentera les éléments clés pour caractériser un indice de qualité, les principaux indices internes et externes et une approche axiomatique pour le choix d'un indice