31,642 research outputs found
ASYMP: Fault-tolerant Mining of Massive Graphs
We present ASYMP, a distributed graph processing system developed for the
timely analysis of graphs with trillions of edges. ASYMP has several
distinguishing features including a robust fault tolerance mechanism, a
lockless architecture which scales seamlessly to thousands of machines, and
efficient data access patterns to reduce per-machine overhead. ASYMP is used to
analyze the largest graphs at Google, and the graphs we consider in our
empirical evaluation here are, to the best of our knowledge, the largest
considered in the literature.
Our experimental results show that compared to previous graph processing
frameworks at Google, ASYMP can scale to larger graphs, operate on more crowded
clusters, and complete real-world graph mining analytic tasks faster. First, we
evaluate the speed of ASYMP, where we show that across a diverse selection of
graphs, it runs Connected Component 3-50x faster than state of the art
implementations in MapReduce and Pregel. Then we demonstrate the scalability
and parallelism of this framework: first by showing that the running time
increases linearly by increasing the size of the graphs (without changing the
number of machines), and then by showing the gains in running time while
increasing the number of machines. Finally, we demonstrate the fault-tolerance
properties for the framework, showing that inducing 50% of our machines to fail
increases the running time by only 41%
Foundations of Temporal Text Networks
Three fundamental elements to understand human information networks are the
individuals (actors) in the network, the information they exchange, that is
often observable online as text content (emails, social media posts, etc.), and
the time when these exchanges happen. An extremely large amount of research has
addressed some of these aspects either in isolation or as combinations of two
of them. There are also more and more works studying systems where all three
elements are present, but typically using ad hoc models and algorithms that
cannot be easily transfered to other contexts. To address this heterogeneity,
in this article we present a simple, expressive and extensible model for
temporal text networks, that we claim can be used as a common ground across
different types of networks and analysis tasks, and we show how simple
procedures to produce views of the model allow the direct application of
analysis methods already developed in other domains, from traditional data
mining to multilayer network mining.Comment: 24 pages, 11 figures, 2 table
Influence Maximization over Markovian Graphs: A Stochastic Optimization Approach
This paper considers the problem of randomized influence maximization over a
Markovian graph process: given a fixed set of nodes whose connectivity graph is
evolving as a Markov chain, estimate the probability distribution (over this
fixed set of nodes) that samples a node which will initiate the largest
information cascade (in expectation). Further, it is assumed that the sampling
process affects the evolution of the graph i.e. the sampling distribution and
the transition probability matrix are functionally dependent. In this setup,
recursive stochastic optimization algorithms are presented to estimate the
optimal sampling distribution for two cases: 1) transition probabilities of the
graph are unknown but, the graph can be observed perfectly 2) transition
probabilities of the graph are known but, the graph is observed in noise. These
algorithms consist of a neighborhood size estimation algorithm combined with a
variance reduction method, a Bayesian filter and a stochastic gradient
algorithm. Convergence of the algorithms are established theoretically and,
numerical results are provided to illustrate how the algorithms work
Network interpolation
Given a set of snapshots from a temporal network we develop, analyze, and
experimentally validate a so-called network interpolation scheme. Our method
allows us to build a plausible, albeit random, sequence of graphs that
transition between any two given graphs. Importantly, our model is well
characterized by a Markov chain, and we leverage this representation to
analytically estimate the hitting time (to a predefined distance to the target
graph) and long term behavior of our model. These observations also serve to
provide interpretation and justification for a rate parameter in our model.
Lastly, through a mix of synthetic and real-world data experiments we
demonstrate that our model builds reasonable graph trajectories between
snapshots, as measured through various graph statistics. In these experiments,
we find that our interpolation scheme compares favorably to common network
growth models, such as preferential attachment and triadic closure.Comment: final preprin
Deep Learning on Graphs: A Survey
Deep learning has been shown to be successful in a number of domains, ranging
from acoustics, images, to natural language processing. However, applying deep
learning to the ubiquitous graph data is non-trivial because of the unique
characteristics of graphs. Recently, substantial research efforts have been
devoted to applying deep learning methods to graphs, resulting in beneficial
advances in graph analysis techniques. In this survey, we comprehensively
review the different types of deep learning methods on graphs. We divide the
existing methods into five categories based on their model architectures and
training strategies: graph recurrent neural networks, graph convolutional
networks, graph autoencoders, graph reinforcement learning, and graph
adversarial methods. We then provide a comprehensive overview of these methods
in a systematic manner mainly by following their development history. We also
analyze the differences and compositions of different methods. Finally, we
briefly outline the applications in which they have been used and discuss
potential future research directions.Comment: Accepted by Transactions on Knowledge and Data Engineering. 24 pages,
11 figure
Machine Learning on Graphs: A Model and Comprehensive Taxonomy
There has been a surge of recent interest in learning representations for
graph-structured data. Graph representation learning methods have generally
fallen into three main categories, based on the availability of labeled data.
The first, network embedding (such as shallow graph embedding or graph
auto-encoders), focuses on learning unsupervised representations of relational
structure. The second, graph regularized neural networks, leverages graphs to
augment neural network losses with a regularization objective for
semi-supervised learning. The third, graph neural networks, aims to learn
differentiable functions over discrete topologies with arbitrary structure.
However, despite the popularity of these areas there has been surprisingly
little work on unifying the three paradigms. Here, we aim to bridge the gap
between graph neural networks, network embedding and graph regularization
models. We propose a comprehensive taxonomy of representation learning methods
for graph-structured data, aiming to unify several disparate bodies of work.
Specifically, we propose a Graph Encoder Decoder Model (GRAPHEDM), which
generalizes popular algorithms for semi-supervised learning on graphs (e.g.
GraphSage, Graph Convolutional Networks, Graph Attention Networks), and
unsupervised learning of graph representations (e.g. DeepWalk, node2vec, etc)
into a single consistent approach. To illustrate the generality of this
approach, we fit over thirty existing methods into this framework. We believe
that this unifying view both provides a solid foundation for understanding the
intuition behind these methods, and enables future research in the area
Streaming Graph Neural Networks
Graphs are essential representations of many real-world data such as social
networks. Recent years have witnessed the increasing efforts made to extend the
neural network models to graph-structured data. These methods, which are
usually known as the graph neural networks, have been applied to advance many
graphs related tasks such as reasoning dynamics of the physical system, graph
classification, and node classification. Most of the existing graph neural
network models have been designed for static graphs, while many real-world
graphs are inherently dynamic. For example, social networks are naturally
evolving as new users joining and new relations being created. Current graph
neural network models cannot utilize the dynamic information in dynamic graphs.
However, the dynamic information has been proven to enhance the performance of
many graph analytic tasks such as community detection and link prediction.
Hence, it is necessary to design dedicated graph neural networks for dynamic
graphs. In this paper, we propose DGNN, a new {\bf D}ynamic {\bf G}raph {\bf
N}eural {\bf N}etwork model, which can model the dynamic information as the
graph evolving. In particular, the proposed framework can keep updating node
information by capturing the sequential information of edges (interactions),
the time intervals between edges and information propagation coherently.
Experimental results on various dynamic graphs demonstrate the effectiveness of
the proposed framework
Extracting Hidden Groups and their Structure from Streaming Interaction Data
When actors in a social network interact, it usually means they have some
general goal towards which they are collaborating. This could be a research
collaboration in a company or a foursome planning a golf game. We call such
groups \emph{planning groups}. In many social contexts, it might be possible to
observe the \emph{dyadic interactions} between actors, even if the actors do
not explicitly declare what groups they belong too. When groups are not
explicitly declared, we call them \emph{hidden groups}. Our particular focus is
hidden planning groups. By virtue of their need to further their goal, the
actors within such groups must interact in a manner which differentiates their
communications from random background communications. In such a case, one can
infer (from these interactions) the composition and structure of the hidden
planning groups. We formulate the problem of hidden group discovery from
streaming interaction data, and we propose efficient algorithms for identifying
the hidden group structures by isolating the hidden group's non-random,
planning-related, communications from the random background communications. We
validate our algorithms on real data (the Enron email corpus and Blog
communication data). Analysis of the results reveals that our algorithms
extract meaningful hidden group structures
Conversational Networks for Automatic Online Moderation
Moderation of user-generated content in an online community is a challenge
that has great socio-economical ramifications. However, the costs incurred by
delegating this work to human agents are high. For this reason, an automatic
system able to detect abuse in user-generated content is of great interest.
There are a number of ways to tackle this problem, but the most commonly seen
in practice are word filtering or regular expression matching. The main
limitations are their vulnerability to intentional obfuscation on the part of
the users, and their context-insensitive nature. Moreover, they are
language-dependent and may require appropriate corpora for training. In this
paper, we propose a system for automatic abuse detection that completely
disregards message content. We first extract a conversational network from raw
chat logs and characterize it through topological measures. We then use these
as features to train a classifier on our abuse detection task. We thoroughly
assess our system on a dataset of user comments originating from a French
Massively Multiplayer Online Game. We identify the most appropriate network
extraction parameters and discuss the discriminative power of our features,
relatively to their topological and temporal nature. Our method reaches an
F-measure of 83.89 when using the full feature set, improving on existing
approaches. With a selection of the most discriminative features, we
dramatically cut computing time while retaining most of the performance
(82.65)
Dual Convolutional Neural Network for Graph of Graphs Link Prediction
Graphs are general and powerful data representations which can model complex
real-world phenomena, ranging from chemical compounds to social networks;
however, effective feature extraction from graphs is not a trivial task, and
much work has been done in the field of machine learning and data mining. The
recent advances in graph neural networks have made automatic and flexible
feature extraction from graphs possible and have improved the predictive
performance significantly. In this paper, we go further with this line of
research and address a more general problem of learning with a graph of graphs
(GoG) consisting of an external graph and internal graphs, where each node in
the external graph has an internal graph structure. We propose a dual
convolutional neural network that extracts node representations by combining
the external and internal graph structures in an end-to-end manner. Experiments
on link prediction tasks using several chemical network datasets demonstrate
the effectiveness of the proposed method
- …