8 research outputs found
The Effects of Randomness on the Stability of Node Embeddings
We systematically evaluate the (in-)stability of state-of-the-art node
embedding algorithms due to randomness, i.e., the random variation of their
outcomes given identical algorithms and graphs. We apply five node embeddings
algorithms---HOPE, LINE, node2vec, SDNE, and GraphSAGE---to synthetic and
empirical graphs and assess their stability under randomness with respect to
(i) the geometry of embedding spaces as well as (ii) their performance in
downstream tasks. We find significant instabilities in the geometry of
embedding spaces independent of the centrality of a node. In the evaluation of
downstream tasks, we find that the accuracy of node classification seems to be
unaffected by random seeding while the actual classification of nodes can vary
significantly. This suggests that instability effects need to be taken into
account when working with node embeddings. Our work is relevant for researchers
and engineers interested in the effectiveness, reliability, and reproducibility
of node embedding approaches
Parallel Computation of Graph Embeddings
Graph embedding aims at learning a vector-based representation of vertices
that incorporates the structure of the graph. This representation then enables
inference of graph properties. Existing graph embedding techniques, however, do
not scale well to large graphs. We therefore propose a framework for parallel
computation of a graph embedding using a cluster of compute nodes with resource
constraints. We show how to distribute any existing embedding technique by
first splitting a graph for any given set of constrained compute nodes and then
reconciling the embedding spaces derived for these subgraphs. We also propose a
new way to evaluate the quality of graph embeddings that is independent of a
specific inference task. Based thereon, we give a formal bound on the
difference between the embeddings derived by centralised and parallel
computation. Experimental results illustrate that our approach for parallel
computation scales well, while largely maintaining the embedding quality
Graph Convolutional Networks for Graphs Containing Missing Features
Graph Convolutional Network (GCN) has experienced great success in graph
analysis tasks. It works by smoothing the node features across the graph. The
current GCN models overwhelmingly assume that the node feature information is
complete. However, real-world graph data are often incomplete and containing
missing features. Traditionally, people have to estimate and fill in the
unknown features based on imputation techniques and then apply GCN. However,
the process of feature filling and graph learning are separated, resulting in
degraded and unstable performance. This problem becomes more serious when a
large number of features are missing. We propose an approach that adapts GCN to
graphs containing missing features. In contrast to traditional strategy, our
approach integrates the processing of missing features and graph learning
within the same neural network architecture. Our idea is to represent the
missing data by Gaussian Mixture Model (GMM) and calculate the expected
activation of neurons in the first hidden layer of GCN, while keeping the other
layers of the network unchanged. This enables us to learn the GMM parameters
and network weight parameters in an end-to-end manner. Notably, our approach
does not increase the computational complexity of GCN and it is consistent with
GCN when the features are complete. We demonstrate through extensive
experiments that our approach significantly outperforms the imputation-based
methods in node classification and link prediction tasks. We show that the
performance of our approach for the case with a low level of missing features
is even superior to GCN for the case with complete features
A Survey on Dynamic Network Embedding
Real-world networks are composed of diverse interacting and evolving
entities, while most of existing researches simply characterize them as
particular static networks, without consideration of the evolution trend in
dynamic networks. Recently, significant progresses in tracking the properties
of dynamic networks have been made, which exploit changes of entities and links
in the network to devise network embedding techniques. Compared to widely
proposed static network embedding methods, dynamic network embedding endeavors
to encode nodes as low-dimensional dense representations that effectively
preserve the network structures and the temporal dynamics, which is beneficial
to multifarious downstream machine learning tasks. In this paper, we conduct a
systematical survey on dynamic network embedding. In specific, basic concepts
of dynamic network embedding are described, notably, we propose a novel
taxonomy of existing dynamic network embedding techniques for the first time,
including matrix factorization based, Skip-Gram based, autoencoder based,
neural networks based and other embedding methods. Additionally, we carefully
summarize the commonly used datasets and a wide variety of subsequent tasks
that dynamic network embedding can benefit. Afterwards and primarily, we
suggest several challenges that the existing algorithms faced and outline
possible directions to facilitate the future research, such as dynamic
embedding models, large-scale dynamic networks, heterogeneous dynamic networks,
dynamic attributed networks, task-oriented dynamic network embedding and more
embedding spaces.Comment: 25 page
Multiplex Bipartite Network Embedding using Dual Hypergraph Convolutional Networks
A bipartite network is a graph structure where nodes are from two distinct
domains and only inter-domain interactions exist as edges. A large number of
network embedding methods exist to learn vectorial node representations from
general graphs with both homogeneous and heterogeneous node and edge types,
including some that can specifically model the distinct properties of bipartite
networks. However, these methods are inadequate to model multiplex bipartite
networks (e.g., in e-commerce), that have multiple types of interactions (e.g.,
click, inquiry, and buy) and node attributes. Most real-world multiplex
bipartite networks are also sparse and have imbalanced node distributions that
are challenging to model. In this paper, we develop an unsupervised Dual
HyperGraph Convolutional Network (DualHGCN) model that scalably transforms the
multiplex bipartite network into two sets of homogeneous hypergraphs and uses
spectral hypergraph convolutional operators, along with intra- and
inter-message passing strategies to promote information exchange within and
across domains, to learn effective node embedding. We benchmark DualHGCN using
four real-world datasets on link prediction and node classification tasks. Our
extensive experiments demonstrate that DualHGCN significantly outperforms
state-of-the-art methods, and is robust to varying sparsity levels and
imbalanced node distributions.Comment: The Web Conference (formerly WWW) 202
Homogeneous Network Embedding for Massive Graphs via Reweighted Personalized PageRank
Given an input graph G and a node v in G, homogeneous network embedding (HNE)
maps the graph structure in the vicinity of v to a compact, fixed-dimensional
feature vector. This paper focuses on HNE for massive graphs, e.g., with
billions of edges. On this scale, most existing approaches fail, as they incur
either prohibitively high costs, or severely compromised result utility. Our
proposed solution, called Node-Reweighted PageRank (NRP), is based on a classic
idea of deriving embedding vectors from pairwise personalized PageRank (PPR)
values. Our contributions are twofold: first, we design a simple and efficient
baseline HNE method based on PPR that is capable of handling billion-edge
graphs on commodity hardware; second and more importantly, we identify an
inherent drawback of vanilla PPR, and address it in our main proposal NRP.
Specifically, PPR was designed for a very different purpose, i.e., ranking
nodes in G based on their relative importance from a source node's perspective.
In contrast, HNE aims to build node embeddings considering the whole graph.
Consequently, node embeddings derived directly from PPR are of suboptimal
utility. The proposed NRP approach overcomes the above deficiency through an
effective and efficient node reweighting algorithm, which augments PPR values
with node degree information, and iteratively adjusts embedding vectors
accordingly. Overall, NRP takes O(mlogn) time and O(m) space to compute all
node embeddings for a graph with m edges and n nodes. Our extensive experiments
that compare NRP against 18 existing solutions over 7 real graphs demonstrate
that NRP achieves higher result utility than all the solutions for link
prediction, graph reconstruction and node classification, while being up to
orders of magnitude faster. In particular, on a billion-edge Twitter graph, NRP
terminates within 4 hours, using a single CPU core.Comment: full version of a paper published in PVLDB 2020, Volume 13, Number 5,
pages 670-683, https://doi.org/10.14778/3377369.3377376, 17 page
A Comparative Study for Unsupervised Network Representation Learning
There has been appreciable progress in unsupervised network representation
learning (UNRL) approaches over graphs recently with flexible random-walk
approaches, new optimization objectives and deep architectures. However, there
is no common ground for systematic comparison of embeddings to understand their
behavior for different graphs and tasks. In this paper we theoretically group
different approaches under a unifying framework and empirically investigate the
effectiveness of different network representation methods. In particular, we
argue that most of the UNRL approaches either explicitly or implicit model and
exploit context information of a node. Consequently, we propose a framework
that casts a variety of approaches -- random walk based, matrix factorization
and deep learning based -- into a unified context-based optimization function.
We systematically group the methods based on their similarities and
differences. We study the differences among these methods in detail which we
later use to explain their performance differences (on downstream tasks). We
conduct a large-scale empirical study considering 9 popular and recent UNRL
techniques and 11 real-world datasets with varying structural properties and
two common tasks -- node classification and link prediction. We find that there
is no single method that is a clear winner and that the choice of a suitable
method is dictated by certain properties of the embedding methods, task and
structural properties of the underlying graph. In addition we also report the
common pitfalls in evaluation of UNRL methods and come up with suggestions for
experimental design and interpretation of results.Comment: Accepted for publication in IEEE TKD
A Survey on Embedding Dynamic Graphs
Embedding static graphs in low-dimensional vector spaces plays a key role in
network analytics and inference, supporting applications like node
classification, link prediction, and graph visualization. However, many
real-world networks present dynamic behavior, including topological evolution,
feature evolution, and diffusion. Therefore, several methods for embedding
dynamic graphs have been proposed to learn network representations over time,
facing novel challenges, such as time-domain modeling, temporal features to be
captured, and the temporal granularity to be embedded. In this survey, we
overview dynamic graph embedding, discussing its fundamentals and the recent
advances developed so far. We introduce the formal definition of dynamic graph
embedding, focusing on the problem setting and introducing a novel taxonomy for
dynamic graph embedding input and output. We further explore different dynamic
behaviors that may be encompassed by embeddings, classifying by topological
evolution, feature evolution, and processes on networks. Afterward, we describe
existing techniques and propose a taxonomy for dynamic graph embedding
techniques based on algorithmic approaches, from matrix and tensor
factorization to deep learning, random walks, and temporal point processes. We
also elucidate main applications, including dynamic link prediction, anomaly
detection, and diffusion prediction, and we further state some promising
research directions in the area.Comment: 41 pages, 10 figure