472 research outputs found
Graph Embedding Techniques, Applications, and Performance: A Survey
Graphs, such as social networks, word co-occurrence networks, and
communication networks, occur naturally in various real-world applications.
Analyzing them yields insight into the structure of society, language, and
different patterns of communication. Many approaches have been proposed to
perform the analysis. Recently, methods which use the representation of graph
nodes in vector space have gained traction from the research community. In this
survey, we provide a comprehensive and structured analysis of various graph
embedding techniques proposed in the literature. We first introduce the
embedding task and its challenges such as scalability, choice of
dimensionality, and features to be preserved, and their possible solutions. We
then present three categories of approaches based on factorization methods,
random walks, and deep learning, with examples of representative algorithms in
each category and analysis of their performance on various tasks. We evaluate
these state-of-the-art methods on a few common datasets and compare their
performance against one another. Our analysis concludes by suggesting some
potential applications and future directions. We finally present the
open-source Python library we developed, named GEM (Graph Embedding Methods,
available at https://github.com/palash1992/GEM), which provides all presented
algorithms within a unified interface to foster and facilitate research on the
topic.Comment: Submitted to Knowledge Based Systems for revie
Deep Learning on Graphs: A Survey
Deep learning has been shown to be successful in a number of domains, ranging
from acoustics, images, to natural language processing. However, applying deep
learning to the ubiquitous graph data is non-trivial because of the unique
characteristics of graphs. Recently, substantial research efforts have been
devoted to applying deep learning methods to graphs, resulting in beneficial
advances in graph analysis techniques. In this survey, we comprehensively
review the different types of deep learning methods on graphs. We divide the
existing methods into five categories based on their model architectures and
training strategies: graph recurrent neural networks, graph convolutional
networks, graph autoencoders, graph reinforcement learning, and graph
adversarial methods. We then provide a comprehensive overview of these methods
in a systematic manner mainly by following their development history. We also
analyze the differences and compositions of different methods. Finally, we
briefly outline the applications in which they have been used and discuss
potential future research directions.Comment: Accepted by Transactions on Knowledge and Data Engineering. 24 pages,
11 figure
Machine Learning on Graphs: A Model and Comprehensive Taxonomy
There has been a surge of recent interest in learning representations for
graph-structured data. Graph representation learning methods have generally
fallen into three main categories, based on the availability of labeled data.
The first, network embedding (such as shallow graph embedding or graph
auto-encoders), focuses on learning unsupervised representations of relational
structure. The second, graph regularized neural networks, leverages graphs to
augment neural network losses with a regularization objective for
semi-supervised learning. The third, graph neural networks, aims to learn
differentiable functions over discrete topologies with arbitrary structure.
However, despite the popularity of these areas there has been surprisingly
little work on unifying the three paradigms. Here, we aim to bridge the gap
between graph neural networks, network embedding and graph regularization
models. We propose a comprehensive taxonomy of representation learning methods
for graph-structured data, aiming to unify several disparate bodies of work.
Specifically, we propose a Graph Encoder Decoder Model (GRAPHEDM), which
generalizes popular algorithms for semi-supervised learning on graphs (e.g.
GraphSage, Graph Convolutional Networks, Graph Attention Networks), and
unsupervised learning of graph representations (e.g. DeepWalk, node2vec, etc)
into a single consistent approach. To illustrate the generality of this
approach, we fit over thirty existing methods into this framework. We believe
that this unifying view both provides a solid foundation for understanding the
intuition behind these methods, and enables future research in the area
A Tale of Two Bases: Local-Nonlocal Regularization on Image Patches with Convolution Framelets
We propose an image representation scheme combining the local and nonlocal
characterization of patches in an image. Our representation scheme can be shown
to be equivalent to a tight frame constructed from convolving local bases (e.g.
wavelet frames, discrete cosine transforms, etc.) with nonlocal bases (e.g.
spectral basis induced by nonlinear dimension reduction on patches), and we
call the resulting frame elements {\it convolution framelets}. Insight gained
from analyzing the proposed representation leads to a novel interpretation of a
recent high-performance patch-based image inpainting algorithm using Point
Integral Method (PIM) and Low Dimension Manifold Model (LDMM) [Osher, Shi and
Zhu, 2016]. In particular, we show that LDMM is a weighted
-regularization on the coefficients obtained by decomposing images into
linear combinations of convolution framelets; based on this understanding, we
extend the original LDMM to a reweighted version that yields further improved
inpainting results. In addition, we establish the energy concentration property
of convolution framelet coefficients for the setting where the local basis is
constructed from a given nonlocal basis via a linear reconstruction framework;
a generalization of this framework to unions of local embeddings can provide a
natural setting for interpreting BM3D, one of the state-of-the-art image
denoising algorithms
Benchmarks for Graph Embedding Evaluation
Graph embedding is the task of representing nodes of a graph in a
low-dimensional space and its applications for graph tasks have gained
significant traction in academia and industry. The primary difference among the
many recently proposed graph embedding methods is the way they preserve the
inherent properties of the graphs. However, in practice, comparing these
methods is very challenging. The majority of methods report performance boosts
on few selected real graphs. Therefore, it is difficult to generalize these
performance improvements to other types of graphs. Given a graph, it is
currently impossible to quantify the advantages of one approach over another.
In this work, we introduce a principled framework to compare graph embedding
methods. Our goal is threefold: (i) provide a unifying framework for comparing
the performance of various graph embedding methods, (ii) establish a benchmark
with real-world graphs that exhibit different structural properties, and (iii)
provide users with a tool to identify the best graph embedding method for their
data. This paper evaluates 4 of the most influential graph embedding methods
and 4 traditional link prediction methods against a corpus of 100 real-world
networks with varying properties. We organize the 100 networks in terms of
their properties to get a better understanding of the embedding performance of
these popular methods. We use the comparisons on our 100 benchmark graphs to
define GFS-score, that can be applied to any embedding method to quantify its
performance. We rank the state-of-the-art embedding approaches using the
GFS-score and show that it can be used to understand and evaluate novel
embedding approaches. We envision that the proposed framework
(https://www.github.com/palash1992/GEM-Benchmark) will serve the community as a
benchmarking platform to test and compare the performance of future graph
embedding techniques
Capturing Edge Attributes via Network Embedding
Network embedding, which aims to learn low-dimensional representations of
nodes, has been used for various graph related tasks including visualization,
link prediction and node classification. Most existing embedding methods rely
solely on network structure. However, in practice we often have auxiliary
information about the nodes and/or their interactions, e.g., content of
scientific papers in co-authorship networks, or topics of communication in
Twitter mention networks. Here we propose a novel embedding method that uses
both network structure and edge attributes to learn better network
representations. Our method jointly minimizes the reconstruction error for
higher-order node neighborhood, social roles and edge attributes using a deep
architecture that can adequately capture highly non-linear interactions. We
demonstrate the efficacy of our model over existing state-of-the-art methods on
a variety of real-world networks including collaboration networks, and social
networks. We also observe that using edge attributes to inform network
embedding yields better performance in downstream tasks such as link prediction
and node classification
Spectral Convergence Rate of Graph Laplacian
Laplacian Eigenvectors of the graph constructed from a data set are used in
many spectral manifold learning algorithms such as diffusion maps and spectral
clustering. Given a graph constructed from a random sample of a -dimensional
compact submanifold in , we establish the spectral
convergence rate of the graph Laplacian. It implies the consistency of the
spectral clustering algorithm via a standard perturbation argument. A simple
numerical study indicates the necessity of a denoising step before applying
spectral algorithms
Representation Learning on Graphs: Methods and Applications
Machine learning on graphs is an important and ubiquitous task with
applications ranging from drug design to friendship recommendation in social
networks. The primary challenge in this domain is finding a way to represent,
or encode, graph structure so that it can be easily exploited by machine
learning models. Traditionally, machine learning approaches relied on
user-defined heuristics to extract features encoding structural information
about a graph (e.g., degree statistics or kernel functions). However, recent
years have seen a surge in approaches that automatically learn to encode graph
structure into low-dimensional embeddings, using techniques based on deep
learning and nonlinear dimensionality reduction. Here we provide a conceptual
review of key advancements in this area of representation learning on graphs,
including matrix factorization-based methods, random-walk based algorithms, and
graph neural networks. We review methods to embed individual nodes as well as
approaches to embed entire (sub)graphs. In doing so, we develop a unified
framework to describe these recent approaches, and we highlight a number of
important applications and directions for future work.Comment: Published in the IEEE Data Engineering Bulletin, September 2017;
version with minor correction
Deep Representation Learning for Social Network Analysis
Social network analysis is an important problem in data mining. A fundamental
step for analyzing social networks is to encode network data into
low-dimensional representations, i.e., network embeddings, so that the network
topology structure and other attribute information can be effectively
preserved. Network representation leaning facilitates further applications such
as classification, link prediction, anomaly detection and clustering. In
addition, techniques based on deep neural networks have attracted great
interests over the past a few years. In this survey, we conduct a comprehensive
review of current literature in network representation learning utilizing
neural network models. First, we introduce the basic models for learning node
representations in homogeneous networks. Meanwhile, we will also introduce some
extensions of the base models in tackling more complex scenarios, such as
analyzing attributed networks, heterogeneous networks and dynamic networks.
Then, we introduce the techniques for embedding subgraphs. After that, we
present the applications of network representation learning. At the end, we
discuss some promising research directions for future work
Disentangling by Subspace Diffusion
We present a novel nonparametric algorithm for symmetry-based disentangling
of data manifolds, the Geometric Manifold Component Estimator (GEOMANCER).
GEOMANCER provides a partial answer to the question posed by Higgins et al.
(2018): is it possible to learn how to factorize a Lie group solely from
observations of the orbit of an object it acts on? We show that fully
unsupervised factorization of a data manifold is possible if the true metric of
the manifold is known and each factor manifold has nontrivial holonomy -- for
example, rotation in 3D. Our algorithm works by estimating the subspaces that
are invariant under random walk diffusion, giving an approximation to the de
Rham decomposition from differential geometry. We demonstrate the efficacy of
GEOMANCER on several complex synthetic manifolds. Our work reduces the question
of whether unsupervised disentangling is possible to the question of whether
unsupervised metric learning is possible, providing a unifying insight into the
geometric nature of representation learning.Comment: Camera-ready version for NeurIPS 202
- …