Search CORE

76 research outputs found

Clustering and Community Detection in Directed Networks: A Survey

Author: Malliaros Fragkiskos D.
Vazirgiannis Michalis
Publication venue: 'Elsevier BV'
Publication date: 05/08/2013
Field of study

Networks (or graphs) appear as dominant structures in diverse domains, including sociology, biology, neuroscience and computer science. In most of the aforementioned cases graphs are directed - in the sense that there is directionality on the edges, making the semantics of the edges non symmetric. An interesting feature that real networks present is the clustering or community structure property, under which the graph topology is organized into modules commonly called communities or clusters. The essence here is that nodes of the same community are highly similar while on the contrary, nodes across communities present low similarity. Revealing the underlying community structure of directed complex networks has become a crucial and interdisciplinary topic with a plethora of applications. Therefore, naturally there is a recent wealth of research production in the area of mining directed graphs - with clustering being the primary method and tool for community detection and evaluation. The goal of this paper is to offer an in-depth review of the methods presented so far for clustering directed networks along with the relevant necessary methodological background and also related applications. The survey commences by offering a concise review of the fundamental concepts and methodological base on which graph clustering algorithms capitalize on. Then we present the relevant work along two orthogonal classifications. The first one is mostly concerned with the methodological principles of the clustering algorithms, while the second one approaches the methods from the viewpoint regarding the properties of a good cluster in a directed network. Further, we present methods and metrics for evaluating graph clustering results, demonstrate interesting application domains and provide promising future research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear

arXiv.org e-Print Archive

CiteSeerX

Higher-order Clustering and Pooling for Graph Neural Networks

Author: Duval Alexandre
Malliaros Fragkiskos
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/09/2022
Field of study

Graph Neural Networks achieve state-of-the-art performance on a plethora of graph classification tasks, especially due to pooling operators, which aggregate learned node embeddings hierarchically into a final graph representation. However, they are not only questioned by recent work showing on par performance with random pooling, but also ignore completely higher-order connectivity patterns. To tackle this issue, we propose HoscPool, a clustering-based graph pooling operator that captures higher-order information hierarchically, leading to richer graph representations. In fact, we learn a probabilistic cluster assignment matrix end-to-end by minimising relaxed formulations of motif spectral clustering in our objective function, and we then extend it to a pooling operator. We evaluate HoscPool on graph classification tasks and its clustering component on graphs with ground-truth community structure, achieving best performance. Lastly, we provide a deep empirical analysis of pooling operators' inner functioning.Comment: CIKM 202

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

${\rm N{\small ode}S{\small ig}}$ : Random Walk Diffusion meets Hashing for Scalable Graph Embeddings

Author: Malliaros Fragkiskos D.
Papadopoulos Apostolos N.
Çelikkanat Abdulkadir
Publication venue
Publication date: 01/10/2020
Field of study

Learning node representations is a crucial task with a plethora of interdisciplinary applications. Nevertheless, as the size of the networks increases, most widely used models face computational challenges to scale to large networks. While there is a recent effort towards designing algorithms that solely deal with scalability issues, most of them behave poorly in terms of accuracy on downstream tasks. In this paper, we aim at studying models that balance the trade-off between efficiency and accuracy. In particular, we propose

{\rm N{\small ode}S{\small ig}}

, a scalable embedding model that computes binary node representations.

{\rm N{\small ode}S{\small ig}}

exploits random walk diffusion probabilities via stable random projection hashing, towards efficiently computing embeddings in the Hamming space. Our extensive experimental evaluation on various graphs has demonstrated that the proposed model achieves a good balance between accuracy and efficiency compared to well-known baseline models on two downstream tasks

arXiv.org e-Print Archive

CORECLUSTER: A Degeneracy Based Graph Clustering Framework

Author: Giatsidis Christos
Malliaros Fragkiskos
Thilikos Dimitrios M.
Vazirgiannis Michalis
Publication venue: HAL CCSD
Publication date: 29/07/2014
Field of study

International audienceGraph clustering or community detection constitutes an important task forinvestigating the internal structure of graphs, with a plethora of applications in several domains. Traditional tools for graph clustering, such asspectral methods, typically suffer from high time and space complexity. In thisarticle, we present \textsc{CoreCluster}, an efficient graph clusteringframework based on the concept of graph degeneracy, that can be used along withany known graph clustering algorithm. Our approach capitalizes on processing thegraph in a hierarchical manner provided by its core expansion sequence, anordered partition of the graph into different levels according to the

k

-coredecomposition. Such a partition provides a way to process the graph inan incremental manner that preserves its clustering structure, whilemaking the execution of the chosen clustering algorithm much faster due to thesmaller size of the graph's partitions onto which the algorithm operates

HAL-Polytechnique

Learning Graph Representations for Influence Maximization

Author: Malliaros Fragkiskos D.
Panagopoulos George
Tziortziotis Nikolaos
Vazirgiannis Michalis
Publication venue
Publication date: 10/12/2021
Field of study

As the field of machine learning for combinatorial optimization advances, traditional problems are resurfaced and readdressed through this new perspective. The overwhelming majority of the literature focuses on small graph problems, while several real-world problems are devoted to large graphs. Here, we focus on two such problems: influence estimation, a #P-hard counting problem, and influence maximization, an NP-hard problem. We develop GLIE, a Graph Neural Network (GNN) that inherently parameterizes an upper bound of influence estimation and train it on small simulated graphs. Experiments show that GLIE provides accurate influence estimation for real graphs up to 10 times larger than the train set. More importantly, it can be used for influence maximization on considerably larger graphs, as the predictions ranking is not affected by the drop of accuracy. We develop a version of CELF optimization with GLIE instead of simulated influence estimation, surpassing the benchmark for influence maximization, although with a computational overhead. To balance the time complexity and quality of influence, we propose two different approaches. The first is a Q-network that learns to choose seeds sequentially using GLIE's predictions. The second defines a provably submodular function based on GLIE's representations to rank nodes fast while building the seed set. The latter provides the best combination of time efficiency and influence spread, outperforming SOTA benchmarks.Comment: 2

arXiv.org e-Print Archive

Core Decomposition of Uncertain Graphs Using Representative Instances

Author: Malliaros Fragkiskos
Papadopoulos Apostolos,
Seux Damien
Vazirgiannis Michalis
Publication venue: HAL CCSD
Publication date: 29/11/2017
Field of study

International audienc

INRIA a CCSD electronic archive server

Boosting Tricks for Word Mover's Distance

Author: Malliaros Fragkiskos,
Skianis Konstantinos
Tziortziotis Nikolaos
Vazirgiannis Michalis
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 15/09/2020
Field of study

Due to the COVID-19 pandemic, the physical meeting of ICANN 2020 has been postponed. The event is scheduled next year’s ICANN in September 2021 in Bratislava, Slovakia.International audienceWord embeddings have opened a new path in creating novel approaches for addressing traditional problems in the natural language processing (NLP) domain. However, using word embeddings to compare text documents remains a relatively unexplored topic-with Word Mover's Distance (WMD) being the prominent tool used so far. In this paper, we present a variety of tools that can further improve the computation of distances between documents based on WMD. We demonstrate that, alternative stopwords, cross document-topic comparison, deep contextualized word vectors and convex metric learning, constitute powerful tools that can boost WMD

INRIA a CCSD electronic archive server

Time-varying Signals Recovery via Graph Neural Networks

Author: Badiey Mohsen
Bouwmans Thierry
Castro-Correa Jhon A.
Giraldo Jhony H.
Malliaros Fragkiskos D.
Mondal Anindya
Publication venue
Publication date: 12/08/2023
Field of study

The recovery of time-varying graph signals is a fundamental problem with numerous applications in sensor networks and forecasting in time series. Effectively capturing the spatio-temporal information in these signals is essential for the downstream tasks. Previous studies have used the smoothness of the temporal differences of such graph signals as an initial assumption. Nevertheless, this smoothness assumption could result in a degradation of performance in the corresponding application when the prior does not hold. In this work, we relax the requirement of this hypothesis by including a learning module. We propose a Time Graph Neural Network (TimeGNN) for the recovery of time-varying graph signals. Our algorithm uses an encoder-decoder architecture with a specialized loss composed of a mean squared error function and a Sobolev smoothness operator.TimeGNN shows competitive performance against previous methods in real datasets.Comment: Published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023, Greec

arXiv.org e-Print Archive

Fouille de Données dans les Réseaux Sociaux et d’Information : Dynamiques et Applications

Author: Malliaros Fragkiskos
Publication venue: HAL CCSD
Publication date: 16/09/2015
Field of study

Networks (or graphs) have become ubiquitous as data from diverse disciplines can naturally be mapped to graph structures. The problem of extracting meaningful information from large scale graph data in an efficient and effective way has become crucial and challenging with several important applications and towards this end, graph mining and analysis methods constitute prominent tools. This dissertation contributes models, tools and observations to problems that arise in the area of mining social and information networks. We built upon computationally efficient graph mining methods in order to: (i) design models for analyzing the structure and dynamics of real-world networks towards unraveling properties that can further be used in practical applications; (ii) develop algorithmic tools for large-scale analytics on data with inherent (e.g., social networks) or without inherent (e.g., text) graph structure. In particular, for the former point we show how to model the engagement dynamics of large social networks and how to assess their vulnerability with respect to user departures from the network. In both cases, by unraveling the dynamics of real social networks, regularities and patterns about their structure and formation can be identified; such knowledge can further be used in various applications including churn prediction, anomaly detection and building robust social networking systems. For the latter, we examine how to identify influential users in complex networks, having direct applications to epidemic control and viral marketing and how to utilize graph mining techniques in order to enhance text analytics tasks and in particular the one of text categorization.Les réseaux (ou graphes) sont devenus omniprésents en raison de leur capacité à représenter naturellement les données trouvées dans de nombreuses et diverses disciplines. Extraire efficacement des informations pertinentes de graphes à grande échelle est un problème crucial et difficile ayant de nombreuses applications. À cette fin, les méthodes de fouille de données graphiques constituent des outils importants. Par l’introduction de nouveaux modèles et outils, et par la réalisation d’observations, cette thèse contribue à la résolution de problèmes qui se posent dans le domaine de la fouille de données provenant des réseaux sociaux et d'information. Nous utilisons des méthodes efficaces de fouille de données graphiques afin de: (i) concevoir des modèles pour l'analyse structurelle et dynamique des réseaux réels dans le but d’extraire des connaissances pouvant être utilisées en pratique, (ii) développer des outils algorithmiques pour l'analyse à grande échelle de données intrinsèquement graphiques (par exemple, réseaux sociaux) ou non intrinsèquement graphiques (par exemple, le texte). En particulier, pour le premier point, nous montrons comment modéliser la dynamique d'engagement au sein de grands réseaux sociaux et comment évaluer leur vulnérabilité par rapport aux départs des utilisateurs. Dans les deux cas, en mettant à jour la dynamique de réseaux sociaux réels, nous pouvons identifier des régularités et des motifs dans leur structure et leur formation. De telles connaissances trouvent de nombreuses applications comme la prévision du churn, la détection des anomalies et la construction de systèmes de réseautage robustes. Dans le deuxième point, nous nous concentrons sur l’identification des utilisateurs influents dans les réseaux complexes, avec des applications directes sur le contrôle des épidémies et le marketing viral, et sur l’utilisation de techniques de fouille de données graphiques pour améliorer les tâches d'analyse de texte et en particulier la classification du texte

Thèses en Ligne

thèses en ligne de ParisTech

HAL-Polytechnique