15,167 research outputs found
A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications
Graph is an important data representation which appears in a wide diversity
of real-world scenarios. Effective graph analytics provides users a deeper
understanding of what is behind the data, and thus can benefit a lot of useful
applications such as node classification, node recommendation, link prediction,
etc. However, most graph analytics methods suffer the high computation and
space cost. Graph embedding is an effective yet efficient way to solve the
graph analytics problem. It converts the graph data into a low dimensional
space in which the graph structural information and graph properties are
maximally preserved. In this survey, we conduct a comprehensive review of the
literature in graph embedding. We first introduce the formal definition of
graph embedding as well as the related concepts. After that, we propose two
taxonomies of graph embedding which correspond to what challenges exist in
different graph embedding problem settings and how the existing work address
these challenges in their solutions. Finally, we summarize the applications
that graph embedding enables and suggest four promising future research
directions in terms of computation efficiency, problem settings, techniques and
application scenarios.Comment: A 20-page comprehensive survey of graph/network embedding for over
150+ papers till year 2018. It provides systematic categorization of
problems, techniques and applications. Accepted by IEEE Transactions on
Knowledge and Data Engineering (TKDE). Comments and suggestions are welcomed
for continuously improving this surve
Graph Embedding Techniques, Applications, and Performance: A Survey
Graphs, such as social networks, word co-occurrence networks, and
communication networks, occur naturally in various real-world applications.
Analyzing them yields insight into the structure of society, language, and
different patterns of communication. Many approaches have been proposed to
perform the analysis. Recently, methods which use the representation of graph
nodes in vector space have gained traction from the research community. In this
survey, we provide a comprehensive and structured analysis of various graph
embedding techniques proposed in the literature. We first introduce the
embedding task and its challenges such as scalability, choice of
dimensionality, and features to be preserved, and their possible solutions. We
then present three categories of approaches based on factorization methods,
random walks, and deep learning, with examples of representative algorithms in
each category and analysis of their performance on various tasks. We evaluate
these state-of-the-art methods on a few common datasets and compare their
performance against one another. Our analysis concludes by suggesting some
potential applications and future directions. We finally present the
open-source Python library we developed, named GEM (Graph Embedding Methods,
available at https://github.com/palash1992/GEM), which provides all presented
algorithms within a unified interface to foster and facilitate research on the
topic.Comment: Submitted to Knowledge Based Systems for revie
Deep Representation Learning for Social Network Analysis
Social network analysis is an important problem in data mining. A fundamental
step for analyzing social networks is to encode network data into
low-dimensional representations, i.e., network embeddings, so that the network
topology structure and other attribute information can be effectively
preserved. Network representation leaning facilitates further applications such
as classification, link prediction, anomaly detection and clustering. In
addition, techniques based on deep neural networks have attracted great
interests over the past a few years. In this survey, we conduct a comprehensive
review of current literature in network representation learning utilizing
neural network models. First, we introduce the basic models for learning node
representations in homogeneous networks. Meanwhile, we will also introduce some
extensions of the base models in tackling more complex scenarios, such as
analyzing attributed networks, heterogeneous networks and dynamic networks.
Then, we introduce the techniques for embedding subgraphs. After that, we
present the applications of network representation learning. At the end, we
discuss some promising research directions for future work
Multimodal Deep Network Embedding with Integrated Structure and Attribute Information
Network embedding is the process of learning low-dimensional representations
for nodes in a network, while preserving node features. Existing studies only
leverage network structure information and focus on preserving structural
features. However, nodes in real-world networks often have a rich set of
attributes providing extra semantic information. It has been demonstrated that
both structural and attribute features are important for network analysis
tasks. To preserve both features, we investigate the problem of integrating
structure and attribute information to perform network embedding and propose a
Multimodal Deep Network Embedding (MDNE) method. MDNE captures the non-linear
network structures and the complex interactions among structures and
attributes, using a deep model consisting of multiple layers of non-linear
functions. Since structures and attributes are two different types of
information, a multimodal learning method is adopted to pre-process them and
help the model to better capture the correlations between node structure and
attribute information. We employ both structural proximity and attribute
proximity in the loss function to preserve the respective features and the
representations are obtained by minimizing the loss function. Results of
extensive experiments on four real-world datasets show that the proposed method
performs significantly better than baselines on a variety of tasks, which
demonstrate the effectiveness and generality of our method.Comment: 15 pages, 10 figure
Scalable Graph Embeddings via Sparse Transpose Proximities
Graph embedding learns low-dimensional representations for nodes in a graph
and effectively preserves the graph structure. Recently, a significant amount
of progress has been made toward this emerging research area. However, there
are several fundamental problems that remain open. First, existing methods fail
to preserve the out-degree distributions on directed graphs. Second, many
existing methods employ random walk based proximities and thus suffer from
conflicting optimization goals on undirected graphs. Finally, existing
factorization methods are unable to achieve scalability and non-linearity
simultaneously.
This paper presents an in-depth study on graph embedding techniques on both
directed and undirected graphs. We analyze the fundamental reasons that lead to
the distortion of out-degree distributions and to the conflicting optimization
goals. We propose {\em transpose proximity}, a unified approach that solves
both problems. Based on the concept of transpose proximity, we design \strap, a
factorization based graph embedding algorithm that achieves scalability and
non-linearity simultaneously. \strap makes use of the {\em backward push}
algorithm to efficiently compute the sparse {\em Personalized PageRank (PPR)}
as its transpose proximities. By imposing the sparsity constraint, we are able
to apply non-linear operations to the proximity matrix and perform efficient
matrix factorization to derive the embedding vectors. Finally, we present an
extensive experimental study that evaluates the effectiveness of various graph
embedding algorithms, and we show that \strap outperforms the state-of-the-art
methods in terms of effectiveness and scalability.Comment: ACM SIGKDD201
FI-GRL: Fast Inductive Graph Representation Learning via Projection-Cost Preservation
Graph representation learning aims at transforming graph data into meaningful
low-dimensional vectors to facilitate the employment of machine learning and
data mining algorithms designed for general data. Most current graph
representation learning approaches are transductive, which means that they
require all the nodes in the graph are known when learning graph
representations and these approaches cannot naturally generalize to unseen
nodes. In this paper, we present a Fast Inductive Graph Representation Learning
framework (FI-GRL) to learn nodes' low-dimensional representations. Our
approach can obtain accurate representations for seen nodes with provable
theoretical guarantees and can easily generalize to unseen nodes. Specifically,
in order to explicitly decouple nodes' relations expressed by the graph, we
transform nodes into a randomized subspace spanned by a random projection
matrix. This stage is guaranteed to preserve the projection-cost of the
normalized random walk matrix which is highly related to the normalized cut of
the graph. Then feature extraction is achieved by conducting singular value
decomposition on the obtained matrix sketch. By leveraging the property of
projection-cost preservation on the matrix sketch, the obtained representation
result is nearly optimal. To deal with unseen nodes, we utilize folding-in
technique to learn their meaningful representations. Empirically, when the
amount of seen nodes are larger than that of unseen nodes, FI-GRL always
achieves excellent results. Our algorithm is fast, simple to implement and
theoretically guaranteed. Extensive experiments on real datasets demonstrate
the superiority of our algorithm on both efficacy and efficiency over both
macroscopic level (clustering) and microscopic level (structural hole
detection) applications.Comment: ICDM 2018, Full Versio
Deep Learning on Graphs: A Survey
Deep learning has been shown to be successful in a number of domains, ranging
from acoustics, images, to natural language processing. However, applying deep
learning to the ubiquitous graph data is non-trivial because of the unique
characteristics of graphs. Recently, substantial research efforts have been
devoted to applying deep learning methods to graphs, resulting in beneficial
advances in graph analysis techniques. In this survey, we comprehensively
review the different types of deep learning methods on graphs. We divide the
existing methods into five categories based on their model architectures and
training strategies: graph recurrent neural networks, graph convolutional
networks, graph autoencoders, graph reinforcement learning, and graph
adversarial methods. We then provide a comprehensive overview of these methods
in a systematic manner mainly by following their development history. We also
analyze the differences and compositions of different methods. Finally, we
briefly outline the applications in which they have been used and discuss
potential future research directions.Comment: Accepted by Transactions on Knowledge and Data Engineering. 24 pages,
11 figure
Unsupervised Inductive Graph-Level Representation Learning via Graph-Graph Proximity
We introduce a novel approach to graph-level representation learning, which
is to embed an entire graph into a vector space where the embeddings of two
graphs preserve their graph-graph proximity. Our approach, UGRAPHEMB, is a
general framework that provides a novel means to performing graph-level
embedding in a completely unsupervised and inductive manner. The learned neural
network can be considered as a function that receives any graph as input,
either seen or unseen in the training set, and transforms it into an embedding.
A novel graph-level embedding generation mechanism called Multi-Scale Node
Attention (MSNA), is proposed. Experiments on five real graph datasets show
that UGRAPHEMB achieves competitive accuracy in the tasks of graph
classification, similarity ranking, and graph visualization.Comment: IJCAI 2019 camera ready version with supplementary materia
Billion-scale Network Embedding with Iterative Random Projection
Network embedding, which learns low-dimensional vector representation for
nodes in the network, has attracted considerable research attention recently.
However, the existing methods are incapable of handling billion-scale networks,
because they are computationally expensive and, at the same time, difficult to
be accelerated by distributed computing schemes. To address these problems, we
propose RandNE (Iterative Random Projection Network Embedding), a novel and
simple billion-scale network embedding method. Specifically, we propose a
Gaussian random projection approach to map the network into a low-dimensional
embedding space while preserving the high-order proximities between nodes. To
reduce the time complexity, we design an iterative projection procedure to
avoid the explicit calculation of the high-order proximities. Theoretical
analysis shows that our method is extremely efficient, and friendly to
distributed computing schemes without any communication cost in the
calculation. We also design a dynamic updating procedure which can efficiently
incorporate the dynamic changes of the networks without error aggregation.
Extensive experimental results demonstrate the efficiency and efficacy of
RandNE over state-of-the-art methods in several tasks including network
reconstruction, link prediction and node classification on multiple datasets
with different scales, ranging from thousands to billions of nodes and edges.Comment: Accepted by ICDM 2018. 10 pages, 8 figures, 2018 IEEE International
Conference on Data Mining (ICDM
Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation
As opposed to manual feature engineering which is tedious and difficult to
scale, network representation learning has attracted a surge of research
interests as it automates the process of feature learning on graphs. The
learned low-dimensional node vector representation is generalizable and eases
the knowledge discovery process on graphs by enabling various off-the-shelf
machine learning tools to be directly applied. Recent research has shown that
the past decade of network embedding approaches either explicitly factorize a
carefully designed matrix to obtain the low-dimensional node vector
representation or are closely related to implicit matrix factorization, with
the fundamental assumption that the factorized node connectivity matrix is
low-rank. Nonetheless, the global low-rank assumption does not necessarily hold
especially when the factorized matrix encodes complex node interactions, and
the resultant single low-rank embedding matrix is insufficient to capture all
the observed connectivity patterns. In this regard, we propose a novel
multi-level network embedding framework BoostNE, which can learn multiple
network embedding representations of different granularity from coarse to fine
without imposing the prevalent global low-rank assumption. The proposed BoostNE
method is also in line with the successful gradient boosting method in ensemble
learning as multiple weak embeddings lead to a stronger and more effective one.
We assess the effectiveness of the proposed BoostNE framework by comparing it
with existing state-of-the-art network embedding methods on various datasets,
and the experimental results corroborate the superiority of the proposed
BoostNE network embedding framework
- …