24 research outputs found
Clustering and Community Detection in Directed Networks: A Survey
Networks (or graphs) appear as dominant structures in diverse domains,
including sociology, biology, neuroscience and computer science. In most of the
aforementioned cases graphs are directed - in the sense that there is
directionality on the edges, making the semantics of the edges non symmetric.
An interesting feature that real networks present is the clustering or
community structure property, under which the graph topology is organized into
modules commonly called communities or clusters. The essence here is that nodes
of the same community are highly similar while on the contrary, nodes across
communities present low similarity. Revealing the underlying community
structure of directed complex networks has become a crucial and
interdisciplinary topic with a plethora of applications. Therefore, naturally
there is a recent wealth of research production in the area of mining directed
graphs - with clustering being the primary method and tool for community
detection and evaluation. The goal of this paper is to offer an in-depth review
of the methods presented so far for clustering directed networks along with the
relevant necessary methodological background and also related applications. The
survey commences by offering a concise review of the fundamental concepts and
methodological base on which graph clustering algorithms capitalize on. Then we
present the relevant work along two orthogonal classifications. The first one
is mostly concerned with the methodological principles of the clustering
algorithms, while the second one approaches the methods from the viewpoint
regarding the properties of a good cluster in a directed network. Further, we
present methods and metrics for evaluating graph clustering results,
demonstrate interesting application domains and provide promising future
research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear
: Random Walk Diffusion meets Hashing for Scalable Graph Embeddings
Learning node representations is a crucial task with a plethora of
interdisciplinary applications. Nevertheless, as the size of the networks
increases, most widely used models face computational challenges to scale to
large networks. While there is a recent effort towards designing algorithms
that solely deal with scalability issues, most of them behave poorly in terms
of accuracy on downstream tasks. In this paper, we aim at studying models that
balance the trade-off between efficiency and accuracy. In particular, we
propose , a scalable embedding model that
computes binary node representations.
exploits random walk diffusion probabilities via stable random projection
hashing, towards efficiently computing embeddings in the Hamming space. Our
extensive experimental evaluation on various graphs has demonstrated that the
proposed model achieves a good balance between accuracy and efficiency compared
to well-known baseline models on two downstream tasks
Learning Graph Representations for Influence Maximization
As the field of machine learning for combinatorial optimization advances,
traditional problems are resurfaced and readdressed through this new
perspective. The overwhelming majority of the literature focuses on small graph
problems, while several real-world problems are devoted to large graphs. Here,
we focus on two such problems: influence estimation, a #P-hard counting
problem, and influence maximization, an NP-hard problem. We develop GLIE, a
Graph Neural Network (GNN) that inherently parameterizes an upper bound of
influence estimation and train it on small simulated graphs. Experiments show
that GLIE provides accurate influence estimation for real graphs up to 10 times
larger than the train set. More importantly, it can be used for influence
maximization on considerably larger graphs, as the predictions ranking is not
affected by the drop of accuracy. We develop a version of CELF optimization
with GLIE instead of simulated influence estimation, surpassing the benchmark
for influence maximization, although with a computational overhead. To balance
the time complexity and quality of influence, we propose two different
approaches. The first is a Q-network that learns to choose seeds sequentially
using GLIE's predictions. The second defines a provably submodular function
based on GLIE's representations to rank nodes fast while building the seed set.
The latter provides the best combination of time efficiency and influence
spread, outperforming SOTA benchmarks.Comment: 2
Time-varying Signals Recovery via Graph Neural Networks
The recovery of time-varying graph signals is a fundamental problem with
numerous applications in sensor networks and forecasting in time series.
Effectively capturing the spatio-temporal information in these signals is
essential for the downstream tasks. Previous studies have used the smoothness
of the temporal differences of such graph signals as an initial assumption.
Nevertheless, this smoothness assumption could result in a degradation of
performance in the corresponding application when the prior does not hold. In
this work, we relax the requirement of this hypothesis by including a learning
module. We propose a Time Graph Neural Network (TimeGNN) for the recovery of
time-varying graph signals. Our algorithm uses an encoder-decoder architecture
with a specialized loss composed of a mean squared error function and a Sobolev
smoothness operator.TimeGNN shows competitive performance against previous
methods in real datasets.Comment: Published in IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP) 2023, Greec
A Machine Learning Tour in Network Science
Graphs, also known as networks, are widely used data structures for modeling complexsystems in various fields, from the social sciences to biology and engineering. Thestrength lies in their ability to represent relationships between entities, such as friendships in social networks or protein interactions in biological networks. In addition to their modeling capabilities, graphs offer a mathematical framework to analyze, understand, and make predictions from real-world datasets. This HDR manuscript presents part of my research contributions to the field of graph representation learning and its applications in network science, focusing on the work conducted after joining CentraleSupélec, Université Paris-Saclay in 2017. The first part of the manuscript explores structure-preserving node embedding techniques that leverage random walks. The second part addresses the challenge of developing graph representation learning models for multilayer and heterogeneous graphs, with a specific focus on applications arising from the domain of computational biology. The third part delves into the design of expressive and explainable graph neural network models. Finally, the last part investigates the application of graph representation learning to tackle the well-studied problems of social influence learning and maximization in complex networks
Vulnerability assessment in social networks under cascade-based node departures
In social networks, new users decide to become members, but also current users depart from the network or stop being active in the activities of their community. The departure of a user may affect the engagement of its neighbors in the graph, that successively may also decide to leave, leading to a disengagement epidemic. We propose a model to capture this cascading effect, based on recent studies about the engagement dynamics of social networks. We introduce a new concept of vulnerability assessment under cascades triggered by the departure of nodes based on their engagement level. Our results indicate that social networks are robust under cascades triggered by randomly selected nodes but highly vulnerable in cascades caused by targeted departures of nodes with high engagement level
Maximizing Influence with Graph Neural Networks
International audienceFinding the seed set that maximizes the influence spread over a network is a well-known NP-hard problem. Though a greedy algorithm can provide near-optimal solutions, the subproblem of influence estimation renders the solutions inefficient. In this work, we propose GLIE, a graph neural network that learns how to estimate the influence spread of the independent cascade. GLIE relies on a theoretical upper bound that is tightened through supervised training. Experiments indicate that it provides accurate influence estimation for real graphs up to 10 times larger than the train set. Subsequently, we incorporate it into two influence maximization techniques. We first utilize Cost Effective Lazy Forward optimization substituting Monte Carlo simulations with GLIE, surpassing the benchmarks albeit with a computational overhead. To improve computational efficiency we develop a provably submodular influence spread based on GLIE's representations, to rank nodes while building the seed set adaptively. The proposed algorithms are inductive, meaning they are trained on graphs with less than 300 nodes and up to 5 seeds, and tested on graphs with millions of nodes and up to 200 seeds. The final method exhibits the most promising combination of time efficiency and influence quality, outperforming several baselines
Fast Robustness Estimation in Large Social Graphs: Communities and Anomaly Detection
Given a large social graph, like a scientific collaboration network, what can we say about its robustness? Can we estimate a robustness index for a graph quickly? If the graph evolves over time, how these properties change? In this work, we are trying to answer the above questions studying the expansion properties of large social graphs. First, we present a measure which characterizes the robustness properties of a graph, and serves as global measure of the community structure (or lack thereof). We study how these properties change over time and we show how to spot outliers and anomalies over time. We apply our method on several diverse real networks with millions of nodes. We also show how to compute our measure efficiently by exploiting the special spectral properties of real-world networks
Uplift Modeling Under Limited Supervision
International audienceEstimating causal effects in e-commerce tends to involve costly treatment assignments which can be impractical in large-scale settings. Leveraging machine learning to predict such treatment effects without actual intervention is a standard practice to diminish the risk. However, existing methods for treatment effect prediction tend to rely on training sets of substantial size, which are built from real experiments and are thus inherently risky to create. In this work we propose a graph neural network to diminish the required training set size, relying on graphs that are common in e-commerce data. Specifically, we view the problem as node regression with a restricted number of labeled instances, develop a two-model neural architecture akin to previous causal effect estimators, and test varying message-passing layers for encoding. Furthermore, as an extra step, we combine the model with an acquisition function to guide the creation of the training set in settings with extremely low experimental budget. The framework is flexible since each step can be used separately with other models or treatment policies. The experiments on real large-scale networks indicate a clear advantage of our methodology over the state of the art, which in many cases performs close to random, underlining the need for models that can generalize with limited supervision to reduce experimental risks