76 research outputs found

    Clustering and Community Detection in Directed Networks: A Survey

    Full text link
    Networks (or graphs) appear as dominant structures in diverse domains, including sociology, biology, neuroscience and computer science. In most of the aforementioned cases graphs are directed - in the sense that there is directionality on the edges, making the semantics of the edges non symmetric. An interesting feature that real networks present is the clustering or community structure property, under which the graph topology is organized into modules commonly called communities or clusters. The essence here is that nodes of the same community are highly similar while on the contrary, nodes across communities present low similarity. Revealing the underlying community structure of directed complex networks has become a crucial and interdisciplinary topic with a plethora of applications. Therefore, naturally there is a recent wealth of research production in the area of mining directed graphs - with clustering being the primary method and tool for community detection and evaluation. The goal of this paper is to offer an in-depth review of the methods presented so far for clustering directed networks along with the relevant necessary methodological background and also related applications. The survey commences by offering a concise review of the fundamental concepts and methodological base on which graph clustering algorithms capitalize on. Then we present the relevant work along two orthogonal classifications. The first one is mostly concerned with the methodological principles of the clustering algorithms, while the second one approaches the methods from the viewpoint regarding the properties of a good cluster in a directed network. Further, we present methods and metrics for evaluating graph clustering results, demonstrate interesting application domains and provide promising future research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear

    Higher-order Clustering and Pooling for Graph Neural Networks

    Full text link
    Graph Neural Networks achieve state-of-the-art performance on a plethora of graph classification tasks, especially due to pooling operators, which aggregate learned node embeddings hierarchically into a final graph representation. However, they are not only questioned by recent work showing on par performance with random pooling, but also ignore completely higher-order connectivity patterns. To tackle this issue, we propose HoscPool, a clustering-based graph pooling operator that captures higher-order information hierarchically, leading to richer graph representations. In fact, we learn a probabilistic cluster assignment matrix end-to-end by minimising relaxed formulations of motif spectral clustering in our objective function, and we then extend it to a pooling operator. We evaluate HoscPool on graph classification tasks and its clustering component on graphs with ground-truth community structure, achieving best performance. Lastly, we provide a deep empirical analysis of pooling operators' inner functioning.Comment: CIKM 202

    NodeSig{\rm N{\small ode}S{\small ig}}: Random Walk Diffusion meets Hashing for Scalable Graph Embeddings

    Full text link
    Learning node representations is a crucial task with a plethora of interdisciplinary applications. Nevertheless, as the size of the networks increases, most widely used models face computational challenges to scale to large networks. While there is a recent effort towards designing algorithms that solely deal with scalability issues, most of them behave poorly in terms of accuracy on downstream tasks. In this paper, we aim at studying models that balance the trade-off between efficiency and accuracy. In particular, we propose NodeSig{\rm N{\small ode}S{\small ig}}, a scalable embedding model that computes binary node representations. NodeSig{\rm N{\small ode}S{\small ig}} exploits random walk diffusion probabilities via stable random projection hashing, towards efficiently computing embeddings in the Hamming space. Our extensive experimental evaluation on various graphs has demonstrated that the proposed model achieves a good balance between accuracy and efficiency compared to well-known baseline models on two downstream tasks

    CORECLUSTER: A Degeneracy Based Graph Clustering Framework

    No full text
    International audienceGraph clustering or community detection constitutes an important task forinvestigating the internal structure of graphs, with a plethora of applications in several domains. Traditional tools for graph clustering, such asspectral methods, typically suffer from high time and space complexity. In thisarticle, we present \textsc{CoreCluster}, an efficient graph clusteringframework based on the concept of graph degeneracy, that can be used along withany known graph clustering algorithm. Our approach capitalizes on processing thegraph in a hierarchical manner provided by its core expansion sequence, anordered partition of the graph into different levels according to the kk-coredecomposition. Such a partition provides a way to process the graph inan incremental manner that preserves its clustering structure, whilemaking the execution of the chosen clustering algorithm much faster due to thesmaller size of the graph's partitions onto which the algorithm operates

    Learning Graph Representations for Influence Maximization

    Full text link
    As the field of machine learning for combinatorial optimization advances, traditional problems are resurfaced and readdressed through this new perspective. The overwhelming majority of the literature focuses on small graph problems, while several real-world problems are devoted to large graphs. Here, we focus on two such problems: influence estimation, a #P-hard counting problem, and influence maximization, an NP-hard problem. We develop GLIE, a Graph Neural Network (GNN) that inherently parameterizes an upper bound of influence estimation and train it on small simulated graphs. Experiments show that GLIE provides accurate influence estimation for real graphs up to 10 times larger than the train set. More importantly, it can be used for influence maximization on considerably larger graphs, as the predictions ranking is not affected by the drop of accuracy. We develop a version of CELF optimization with GLIE instead of simulated influence estimation, surpassing the benchmark for influence maximization, although with a computational overhead. To balance the time complexity and quality of influence, we propose two different approaches. The first is a Q-network that learns to choose seeds sequentially using GLIE's predictions. The second defines a provably submodular function based on GLIE's representations to rank nodes fast while building the seed set. The latter provides the best combination of time efficiency and influence spread, outperforming SOTA benchmarks.Comment: 2

    Boosting Tricks for Word Mover's Distance

    Get PDF
    Due to the COVID-19 pandemic, the physical meeting of ICANN 2020 has been postponed. The event is scheduled next year’s ICANN in September 2021 in Bratislava, Slovakia.International audienceWord embeddings have opened a new path in creating novel approaches for addressing traditional problems in the natural language processing (NLP) domain. However, using word embeddings to compare text documents remains a relatively unexplored topic-with Word Mover's Distance (WMD) being the prominent tool used so far. In this paper, we present a variety of tools that can further improve the computation of distances between documents based on WMD. We demonstrate that, alternative stopwords, cross document-topic comparison, deep contextualized word vectors and convex metric learning, constitute powerful tools that can boost WMD

    Time-varying Signals Recovery via Graph Neural Networks

    Full text link
    The recovery of time-varying graph signals is a fundamental problem with numerous applications in sensor networks and forecasting in time series. Effectively capturing the spatio-temporal information in these signals is essential for the downstream tasks. Previous studies have used the smoothness of the temporal differences of such graph signals as an initial assumption. Nevertheless, this smoothness assumption could result in a degradation of performance in the corresponding application when the prior does not hold. In this work, we relax the requirement of this hypothesis by including a learning module. We propose a Time Graph Neural Network (TimeGNN) for the recovery of time-varying graph signals. Our algorithm uses an encoder-decoder architecture with a specialized loss composed of a mean squared error function and a Sobolev smoothness operator.TimeGNN shows competitive performance against previous methods in real datasets.Comment: Published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023, Greec

    Fouille de Données dans les Réseaux Sociaux et d’Information : Dynamiques et Applications

    No full text
    Networks (or graphs) have become ubiquitous as data from diverse disciplines can naturally be mapped to graph structures. The problem of extracting meaningful information from large scale graph data in an efficient and effective way has become crucial and challenging with several important applications and towards this end, graph mining and analysis methods constitute prominent tools. This dissertation contributes models, tools and observations to problems that arise in the area of mining social and information networks. We built upon computationally efficient graph mining methods in order to: (i) design models for analyzing the structure and dynamics of real-world networks towards unraveling properties that can further be used in practical applications; (ii) develop algorithmic tools for large-scale analytics on data with inherent (e.g., social networks) or without inherent (e.g., text) graph structure. In particular, for the former point we show how to model the engagement dynamics of large social networks and how to assess their vulnerability with respect to user departures from the network. In both cases, by unraveling the dynamics of real social networks, regularities and patterns about their structure and formation can be identified; such knowledge can further be used in various applications including churn prediction, anomaly detection and building robust social networking systems. For the latter, we examine how to identify influential users in complex networks, having direct applications to epidemic control and viral marketing and how to utilize graph mining techniques in order to enhance text analytics tasks and in particular the one of text categorization.Les réseaux (ou graphes) sont devenus omniprésents en raison de leur capacité à représenter naturellement les données trouvées dans de nombreuses et diverses disciplines. Extraire efficacement des informations pertinentes de graphes à grande échelle est un problème crucial et difficile ayant de nombreuses applications. À cette fin, les méthodes de fouille de données graphiques constituent des outils importants. Par l’introduction de nouveaux modèles et outils, et par la réalisation d’observations, cette thèse contribue à la résolution de problèmes qui se posent dans le domaine de la fouille de données provenant des réseaux sociaux et d'information. Nous utilisons des méthodes efficaces de fouille de données graphiques afin de: (i) concevoir des modèles pour l'analyse structurelle et dynamique des réseaux réels dans le but d’extraire des connaissances pouvant être utilisées en pratique, (ii) développer des outils algorithmiques pour l'analyse à grande échelle de données intrinsèquement graphiques (par exemple, réseaux sociaux) ou non intrinsèquement graphiques (par exemple, le texte). En particulier, pour le premier point, nous montrons comment modéliser la dynamique d'engagement au sein de grands réseaux sociaux et comment évaluer leur vulnérabilité par rapport aux départs des utilisateurs. Dans les deux cas, en mettant à jour la dynamique de réseaux sociaux réels, nous pouvons identifier des régularités et des motifs dans leur structure et leur formation. De telles connaissances trouvent de nombreuses applications comme la prévision du churn, la détection des anomalies et la construction de systèmes de réseautage robustes. Dans le deuxième point, nous nous concentrons sur l’identification des utilisateurs influents dans les réseaux complexes, avec des applications directes sur le contrôle des épidémies et le marketing viral, et sur l’utilisation de techniques de fouille de données graphiques pour améliorer les tâches d'analyse de texte et en particulier la classification du texte
    • …
    corecore