2,674 research outputs found
Spectral Normalized-Cut Graph Partitioning with Fairness Constraints
Normalized-cut graph partitioning aims to divide the set of nodes in a graph
into disjoint clusters to minimize the fraction of the total edges between
any cluster and all other clusters. In this paper, we consider a fair variant
of the partitioning problem wherein nodes are characterized by a categorical
sensitive attribute (e.g., gender or race) indicating membership to different
demographic groups. Our goal is to ensure that each group is approximately
proportionally represented in each cluster while minimizing the normalized cut
value. To resolve this problem, we propose a two-phase spectral algorithm
called FNM. In the first phase, we add an augmented Lagrangian term based on
our fairness criteria to the objective function for obtaining a fairer spectral
node embedding. Then, in the second phase, we design a rounding scheme to
produce clusters from the fair embedding that effectively trades off
fairness and partition quality. Through comprehensive experiments on nine
benchmark datasets, we demonstrate the superior performance of FNM compared
with three baseline methods.Comment: 17 pages, 7 figures, accepted to the 26th European Conference on
Artificial Intelligence (ECAI 2023
An MBO scheme for clustering and semi-supervised clustering of signed networks
We introduce a principled method for the signed clustering problem, where the goal is to partition a weighted undirected graph whose edge weights take both positive and negative values, such that edges within the same cluster are mostly positive, while edges spanning across clusters are
mostly negative. Our method relies on a graph-based diffuse interface model formulation utilizing the Ginzburg–Landau functional, based on an adaptation of the classic numerical Merriman–Bence–Osher (MBO) scheme for minimizing such graph-based functionals. The proposed objective function aims to minimize the total weight of inter-cluster positively-weighted edges, while maximizing the total weight of the inter-cluster negatively-weighted edges. Our method scales to large sparse networks, and can be easily adjusted to incorporate labelled data information, as is often the case in the context of semisupervised learning. We tested our method on a number of both synthetic stochastic block models and real-world data sets (including financial correlation matrices), and obtained promising results that compare favourably against a number of state-of-the-art approaches from the recent literature
Beyond the arithmetic mean : extensions of spectral clustering and semi-supervised learning for signed and multilayer graphs via matrix power means
In this thesis we present extensions of spectral clustering and semi-supervised learning to signed and multilayer graphs. These extensions are based on a one-parameter family of matrix functions called Matrix Power Means. In the scalar case, this family has the arithmetic, geometric and harmonic means as particular cases. We study the effectivity of this family of matrix functions through suitable versions of the stochastic block model to signed and multilayer graphs. We provide provable properties in expectation and further identify regimes where the state of the art fails whereas our approach provably performs well. Some of the settings that we analyze are as follows: first, the case where each layer presents a reliable approximation to the overall clustering; second, the case when one single layer has information about the clusters whereas the remaining layers are potentially just noise; third, the case when each layer has only partial information but all together show global information about the underlying clustering structure. We present extensive numerical verifications of all our results and provide matrix-free numerical schemes. With these numerical schemes we are able to show that our proposed approach based on matrix power means is scalable to large sparse signed and multilayer graphs. Finally, we evaluate our methods in real world datasets. For instance, we show that our approach consistently identifies clustering structure in a real signed network where previous approaches failed. This further verifies that our methods are competitive to the state of the art.In dieser Arbeit stellen wir Erweiterungen von spektralem Clustering und teilüberwachtem Lernen auf signierte und mehrschichtige Graphen vor. Diese Erweiterungen basieren auf einer einparametrischen Familie von Matrixfunktionen, die Potenzmittel genannt werden. Im skalaren Fall hat diese Familie die arithmetischen, geometrischen und harmonischen Mittel als Spezialfälle. Wir untersuchen die Effektivität dieser Familie von Matrixfunktionen durch Versionen des stochastischen Blockmodells, die für signierte und mehrschichtige Graphen geeignet sind. Wir stellen beweisbare Eigenschaften vor und identifizieren darüber hinaus Situationen in denen neueste, gegenwärtig verwendete Methoden versagen, während unser Ansatz nachweislich gut abschneidet. Wir untersuchen unter anderem folgende Situationen: erstens den Fall, dass jede Schicht eine zuverlässige Approximation an die Gesamtclusterung darstellt; zweitens den Fall, dass eine einzelne Schicht Informationen über die Cluster hat, während die übrigen Schichten möglicherweise nur Rauschen sind; drittens den Fall, dass jede Schicht nur partielle Informationen hat, aber alle zusammen globale Informationen über die zugrunde liegende Clusterstruktur liefern. Wir präsentieren umfangreiche numerische Verifizierungen aller unserer Ergebnisse und stellen matrixfreie numerische Verfahren zur Verfügung. Mit diesen numerischen Methoden sind wir in der Lage zu zeigen, dass unser vorgeschlagener Ansatz, der auf Potenzmitteln basiert, auf große, dünnbesetzte signierte und mehrschichtige Graphen skalierbar ist. Schließlich evaluieren wir unsere Methoden an realen Datensätzen. Zum Beispiel zeigen wir, dass unser Ansatz konsistent Clustering-Strukturen in einem realen signierten Netzwerk identifiziert, wo frühere Ansätze versagten. Dies ist ein weiterer Nachweis, dass unsere Methoden konkurrenzfähig zu den aktuell verwendeten Methoden sind
- …