3,455 research outputs found
Representation Learning on Graphs: Methods and Applications
Machine learning on graphs is an important and ubiquitous task with
applications ranging from drug design to friendship recommendation in social
networks. The primary challenge in this domain is finding a way to represent,
or encode, graph structure so that it can be easily exploited by machine
learning models. Traditionally, machine learning approaches relied on
user-defined heuristics to extract features encoding structural information
about a graph (e.g., degree statistics or kernel functions). However, recent
years have seen a surge in approaches that automatically learn to encode graph
structure into low-dimensional embeddings, using techniques based on deep
learning and nonlinear dimensionality reduction. Here we provide a conceptual
review of key advancements in this area of representation learning on graphs,
including matrix factorization-based methods, random-walk based algorithms, and
graph neural networks. We review methods to embed individual nodes as well as
approaches to embed entire (sub)graphs. In doing so, we develop a unified
framework to describe these recent approaches, and we highlight a number of
important applications and directions for future work.Comment: Published in the IEEE Data Engineering Bulletin, September 2017;
version with minor correction
Recurrent Event Network: Autoregressive Structure Inference over Temporal Knowledge Graphs
Knowledge graph reasoning is a critical task in natural language processing.
The task becomes more challenging on temporal knowledge graphs, where each fact
is associated with a timestamp. Most existing methods focus on reasoning at
past timestamps and they are not able to predict facts happening in the future.
This paper proposes Recurrent Event Network (RE-NET), a novel autoregressive
architecture for predicting future interactions. The occurrence of a fact
(event) is modeled as a probability distribution conditioned on temporal
sequences of past knowledge graphs. Specifically, our RE-NET employs a
recurrent event encoder to encode past facts and uses a neighborhood aggregator
to model the connection of facts at the same timestamp. Future facts can then
be inferred in a sequential manner based on the two modules. We evaluate our
proposed method via link prediction at future times on five public datasets.
Through extensive experiments, we demonstrate the strength of RENET, especially
on multi-step inference over future timestamps, and achieve state-of-the-art
performance on all five datasets. Code and data can be found at
https://github.com/INK-USC/RE-Net.Comment: 15 pages, 8 figures, accepted at as full paper in EMNLP 202
Capturing Edge Attributes via Network Embedding
Network embedding, which aims to learn low-dimensional representations of
nodes, has been used for various graph related tasks including visualization,
link prediction and node classification. Most existing embedding methods rely
solely on network structure. However, in practice we often have auxiliary
information about the nodes and/or their interactions, e.g., content of
scientific papers in co-authorship networks, or topics of communication in
Twitter mention networks. Here we propose a novel embedding method that uses
both network structure and edge attributes to learn better network
representations. Our method jointly minimizes the reconstruction error for
higher-order node neighborhood, social roles and edge attributes using a deep
architecture that can adequately capture highly non-linear interactions. We
demonstrate the efficacy of our model over existing state-of-the-art methods on
a variety of real-world networks including collaboration networks, and social
networks. We also observe that using edge attributes to inform network
embedding yields better performance in downstream tasks such as link prediction
and node classification
Deep Representation Learning for Social Network Analysis
Social network analysis is an important problem in data mining. A fundamental
step for analyzing social networks is to encode network data into
low-dimensional representations, i.e., network embeddings, so that the network
topology structure and other attribute information can be effectively
preserved. Network representation leaning facilitates further applications such
as classification, link prediction, anomaly detection and clustering. In
addition, techniques based on deep neural networks have attracted great
interests over the past a few years. In this survey, we conduct a comprehensive
review of current literature in network representation learning utilizing
neural network models. First, we introduce the basic models for learning node
representations in homogeneous networks. Meanwhile, we will also introduce some
extensions of the base models in tackling more complex scenarios, such as
analyzing attributed networks, heterogeneous networks and dynamic networks.
Then, we introduce the techniques for embedding subgraphs. After that, we
present the applications of network representation learning. At the end, we
discuss some promising research directions for future work
Machine Learning on Graphs: A Model and Comprehensive Taxonomy
There has been a surge of recent interest in learning representations for
graph-structured data. Graph representation learning methods have generally
fallen into three main categories, based on the availability of labeled data.
The first, network embedding (such as shallow graph embedding or graph
auto-encoders), focuses on learning unsupervised representations of relational
structure. The second, graph regularized neural networks, leverages graphs to
augment neural network losses with a regularization objective for
semi-supervised learning. The third, graph neural networks, aims to learn
differentiable functions over discrete topologies with arbitrary structure.
However, despite the popularity of these areas there has been surprisingly
little work on unifying the three paradigms. Here, we aim to bridge the gap
between graph neural networks, network embedding and graph regularization
models. We propose a comprehensive taxonomy of representation learning methods
for graph-structured data, aiming to unify several disparate bodies of work.
Specifically, we propose a Graph Encoder Decoder Model (GRAPHEDM), which
generalizes popular algorithms for semi-supervised learning on graphs (e.g.
GraphSage, Graph Convolutional Networks, Graph Attention Networks), and
unsupervised learning of graph representations (e.g. DeepWalk, node2vec, etc)
into a single consistent approach. To illustrate the generality of this
approach, we fit over thirty existing methods into this framework. We believe
that this unifying view both provides a solid foundation for understanding the
intuition behind these methods, and enables future research in the area
Dynamic Graph Representation Learning via Self-Attention Networks
Learning latent representations of nodes in graphs is an important and
ubiquitous task with widespread applications such as link prediction, node
classification, and graph visualization. Previous methods on graph
representation learning mainly focus on static graphs, however, many real-world
graphs are dynamic and evolve over time. In this paper, we present Dynamic
Self-Attention Network (DySAT), a novel neural architecture that operates on
dynamic graphs and learns node representations that capture both structural
properties and temporal evolutionary patterns. Specifically, DySAT computes
node representations by jointly employing self-attention layers along two
dimensions: structural neighborhood and temporal dynamics. We conduct link
prediction experiments on two classes of graphs: communication networks and
bipartite rating networks. Our experimental results show that DySAT has a
significant performance gain over several different state-of-the-art graph
embedding baselines
Latent Network Summarization: Bridging Network Embedding and Summarization
Motivated by the computational and storage challenges that dense embeddings
pose, we introduce the problem of latent network summarization that aims to
learn a compact, latent representation of the graph structure with
dimensionality that is independent of the input graph size (i.e., #nodes and
#edges), while retaining the ability to derive node representations on the fly.
We propose Multi-LENS, an inductive multi-level latent network summarization
approach that leverages a set of relational operators and relational functions
(compositions of operators) to capture the structure of egonets and
higher-order subgraphs, respectively. The structure is stored in low-rank,
size-independent structural feature matrices, which along with the relational
functions comprise our latent network summary. Multi-LENS is general and
naturally supports both homogeneous and heterogeneous graphs with or without
directionality, weights, attributes or labels. Extensive experiments on real
graphs show 3.5 - 34.3% improvement in AUC for link prediction, while requiring
80 - 2152x less output storage space than baseline embedding methods on large
datasets. As application areas, we show the effectiveness of Multi-LENS in
detecting anomalies and events in the Enron email communication graph and
Twitter co-mention graph
Learning to Identify High Betweenness Centrality Nodes from Scratch: A Novel Graph Neural Network Approach
Betweenness centrality (BC) is one of the most used centrality measures for
network analysis, which seeks to describe the importance of nodes in a network
in terms of the fraction of shortest paths that pass through them. It is key to
many valuable applications, including community detection and network
dismantling. Computing BC scores on large networks is computationally
challenging due to high time complexity. Many approximation algorithms have
been proposed to speed up the estimation of BC, which are mainly
sampling-based. However, these methods are still prone to considerable
execution time on large-scale networks, and their results are often exacerbated
when small changes happen to the network structures. In this paper, we focus on
identifying nodes with high BC in a graph, since many application scenarios are
built upon retrieving nodes with top-k BC. Different from previous heuristic
methods, we turn this task into a learning problem and design an
encoder-decoder based framework to resolve the problem. More specifcally, the
encoder leverages the network structure to encode each node into an embedding
vector, which captures the important structural information of the node. The
decoder transforms the embedding vector for each node into a scalar, which
captures the relative rank of this node in terms of BC. We use the pairwise
ranking loss to train the model to identify the orders of nodes regarding their
BC. By training on small-scale networks, the learned model is capable of
assigning relative BC scores to nodes for any unseen networks, and thus
identifying the highly-ranked nodes. Comprehensive experiments on both
synthetic and real-world networks demonstrate that, compared to representative
baselines, our model drastically speeds up the prediction without noticeable
sacrifce in accuracy, and outperforms the state-of-the-art by accuracy on
several large real-world networks.Comment: 10 pages, 4 figures, 8 table
GESF: A Universal Discriminative Mapping Mechanism for Graph Representation Learning
Graph embedding is a central problem in social network analysis and many
other applications, aiming to learn the vector representation for each node.
While most existing approaches need to specify the neighborhood and the
dependence form to the neighborhood, which may significantly degrades the
flexibility of representation, we propose a novel graph node embedding method
(namely GESF) via the set function technique. Our method can 1) learn an
arbitrary form of representation function from neighborhood, 2) automatically
decide the significance of neighbors at different distances, and 3) be applied
to heterogeneous graph embedding, which may contain multiple types of nodes.
Theoretical guarantee for the representation capability of our method has been
proved for general homogeneous and heterogeneous graphs and evaluation results
on benchmark data sets show that the proposed GESF outperforms the
state-of-the-art approaches on producing node vectors for classification tasks.Comment: 18 page
GrAMME: Semi-Supervised Learning using Multi-layered Graph Attention Models
Modern data analysis pipelines are becoming increasingly complex due to the
presence of multi-view information sources. While graphs are effective in
modeling complex relationships, in many scenarios a single graph is rarely
sufficient to succinctly represent all interactions, and hence multi-layered
graphs have become popular. Though this leads to richer representations,
extending solutions from the single-graph case is not straightforward.
Consequently, there is a strong need for novel solutions to solve classical
problems, such as node classification, in the multi-layered case. In this
paper, we consider the problem of semi-supervised learning with multi-layered
graphs. Though deep network embeddings, e.g. DeepWalk, are widely adopted for
community discovery, we argue that feature learning with random node
attributes, using graph neural networks, can be more effective. To this end, we
propose to use attention models for effective feature learning, and develop two
novel architectures, GrAMME-SG and GrAMME-Fusion, that exploit the inter-layer
dependencies for building multi-layered graph embeddings. Using empirical
studies on several benchmark datasets, we evaluate the proposed approaches and
demonstrate significant performance improvements in comparison to
state-of-the-art network embedding strategies. The results also show that using
simple random features is an effective choice, even in cases where explicit
node attributes are not available
- …