2,269 research outputs found
A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications
Graph is an important data representation which appears in a wide diversity
of real-world scenarios. Effective graph analytics provides users a deeper
understanding of what is behind the data, and thus can benefit a lot of useful
applications such as node classification, node recommendation, link prediction,
etc. However, most graph analytics methods suffer the high computation and
space cost. Graph embedding is an effective yet efficient way to solve the
graph analytics problem. It converts the graph data into a low dimensional
space in which the graph structural information and graph properties are
maximally preserved. In this survey, we conduct a comprehensive review of the
literature in graph embedding. We first introduce the formal definition of
graph embedding as well as the related concepts. After that, we propose two
taxonomies of graph embedding which correspond to what challenges exist in
different graph embedding problem settings and how the existing work address
these challenges in their solutions. Finally, we summarize the applications
that graph embedding enables and suggest four promising future research
directions in terms of computation efficiency, problem settings, techniques and
application scenarios.Comment: A 20-page comprehensive survey of graph/network embedding for over
150+ papers till year 2018. It provides systematic categorization of
problems, techniques and applications. Accepted by IEEE Transactions on
Knowledge and Data Engineering (TKDE). Comments and suggestions are welcomed
for continuously improving this surve
Learning Collective Behavior in Multi-relational Networks
With the rapid expansion of the Internet and WWW, the problem of analyzing social media data has received an increasing amount of attention in the past decade. The boom in social media platforms offers many possibilities to study human collective behavior and interactions on an unprecedented scale. In the past, much work has been done on the problem of learning from networked data with homogeneous topologies, where instances are explicitly or implicitly inter-connected by a single type of relationship. In contrast to traditional content-only classification methods, relational learning succeeds in improving classification performance by leveraging the correlation of the labels between linked instances. However, networked data extracted from social media, web pages, and bibliographic databases can contain entities of multiple classes and linked by various causal reasons, hence treating all links in a homogeneous way can limit the performance of relational classifiers. Learning the collective behavior and interactions in heterogeneous networks becomes much more complex. The contribution of this dissertation include 1) two classification frameworks for identifying human collective behavior in multi-relational social networks; 2) unsupervised and supervised learning models for relationship prediction in multi-relational collaborative networks. Our methods improve the performance of homogeneous predictive models by differentiating heterogeneous relations and capturing the prominent interaction patterns underlying the network structure. The work has been evaluated in various real-world social networks. We believe that this study will be useful for analyzing human collective behavior and interactions specifically in the scenario when the heterogeneous relationships in the network arise from various causal reasons
Attention-based Graph Neural Network for Semi-supervised Learning
Recently popularized graph neural networks achieve the state-of-the-art
accuracy on a number of standard benchmark datasets for graph-based
semi-supervised learning, improving significantly over existing approaches.
These architectures alternate between a propagation layer that aggregates the
hidden states of the local neighborhood and a fully-connected layer. Perhaps
surprisingly, we show that a linear model, that removes all the intermediate
fully-connected layers, is still able to achieve a performance comparable to
the state-of-the-art models. This significantly reduces the number of
parameters, which is critical for semi-supervised learning where number of
labeled examples are small. This in turn allows a room for designing more
innovative propagation layers. Based on this insight, we propose a novel graph
neural network that removes all the intermediate fully-connected layers, and
replaces the propagation layers with attention mechanisms that respect the
structure of the graph. The attention mechanism allows us to learn a dynamic
and adaptive local summary of the neighborhood to achieve more accurate
predictions. In a number of experiments on benchmark citation networks
datasets, we demonstrate that our approach outperforms competing methods. By
examining the attention weights among neighbors, we show that our model
provides some interesting insights on how neighbors influence each other
COSINE: Compressive Network Embedding on Large-scale Information Networks
There is recently a surge in approaches that learn low-dimensional embeddings
of nodes in networks. As there are many large-scale real-world networks, it's
inefficient for existing approaches to store amounts of parameters in memory
and update them edge after edge. With the knowledge that nodes having similar
neighborhood will be close to each other in embedding space, we propose COSINE
(COmpresSIve NE) algorithm which reduces the memory footprint and accelerates
the training process by parameters sharing among similar nodes. COSINE applies
graph partitioning algorithms to networks and builds parameter sharing
dependency of nodes based on the result of partitioning. With parameters
sharing among similar nodes, COSINE injects prior knowledge about higher
structural information into training process which makes network embedding more
efficient and effective. COSINE can be applied to any embedding lookup method
and learn high-quality embeddings with limited memory and shorter training
time. We conduct experiments of multi-label classification and link prediction,
where baselines and our model have the same memory usage. Experimental results
show that COSINE gives baselines up to 23% increase on classification and up to
25% increase on link prediction. Moreover, time of all representation learning
methods using COSINE decreases from 30% to 70%
Link Prediction in Social Networks: the State-of-the-Art
In social networks, link prediction predicts missing links in current
networks and new or dissolution links in future networks, is important for
mining and analyzing the evolution of social networks. In the past decade, many
works have been done about the link prediction in social networks. The goal of
this paper is to comprehensively review, analyze and discuss the
state-of-the-art of the link prediction in social networks. A systematical
category for link prediction techniques and problems is presented. Then link
prediction techniques and problems are analyzed and discussed. Typical
applications of link prediction are also addressed. Achievements and roadmaps
of some active research groups are introduced. Finally, some future challenges
of the link prediction in social networks are discussed.Comment: 38 pages, 13 figures, Science China: Information Science, 201
A Survey on Embedding Dynamic Graphs
Embedding static graphs in low-dimensional vector spaces plays a key role in
network analytics and inference, supporting applications like node
classification, link prediction, and graph visualization. However, many
real-world networks present dynamic behavior, including topological evolution,
feature evolution, and diffusion. Therefore, several methods for embedding
dynamic graphs have been proposed to learn network representations over time,
facing novel challenges, such as time-domain modeling, temporal features to be
captured, and the temporal granularity to be embedded. In this survey, we
overview dynamic graph embedding, discussing its fundamentals and the recent
advances developed so far. We introduce the formal definition of dynamic graph
embedding, focusing on the problem setting and introducing a novel taxonomy for
dynamic graph embedding input and output. We further explore different dynamic
behaviors that may be encompassed by embeddings, classifying by topological
evolution, feature evolution, and processes on networks. Afterward, we describe
existing techniques and propose a taxonomy for dynamic graph embedding
techniques based on algorithmic approaches, from matrix and tensor
factorization to deep learning, random walks, and temporal point processes. We
also elucidate main applications, including dynamic link prediction, anomaly
detection, and diffusion prediction, and we further state some promising
research directions in the area.Comment: 41 pages, 10 figure
Semi-Supervised Learning on Graphs Based on Local Label Distributions
Most approaches that tackle the problem of node classification consider nodes
to be similar, if they have shared neighbors or are close to each other in the
graph. Recent methods for attributed graphs additionally take attributes of
neighboring nodes into account. We argue that the class labels of the neighbors
bear important information and considering them helps to improve classification
quality. Two nodes which are similar based on class labels in their
neighborhood do not need to be close-by in the graph and may even belong to
different connected components. In this work, we propose a novel approach for
the semi-supervised node classification. Precisely, we propose a new node
embedding which is based on the class labels in the local neighborhood of a
node. We show that this is a different setting from attribute-based embeddings
and thus, we propose a new method to learn label-based node embeddings which
can mirror a variety of relations between the class labels of neighboring
nodes. Our experimental evaluation demonstrates that our new methods can
significantly improve the prediction quality on real world data sets
Network Representation Learning: From Traditional Feature Learning to Deep Learning
Network representation learning (NRL) is an effective graph analytics
technique and promotes users to deeply understand the hidden characteristics of
graph data. It has been successfully applied in many real-world tasks related
to network science, such as social network data processing, biological
information processing, and recommender systems. Deep Learning is a powerful
tool to learn data features. However, it is non-trivial to generalize deep
learning to graph-structured data since it is different from the regular data
such as pictures having spatial information and sounds having temporal
information. Recently, researchers proposed many deep learning-based methods in
the area of NRL. In this survey, we investigate classical NRL from traditional
feature learning method to the deep learning-based model, analyze relationships
between them, and summarize the latest progress. Finally, we discuss open
issues considering NRL and point out the future directions in this field
Attentional Heterogeneous Graph Neural Network: Application to Program Reidentification
Program or process is an integral part of almost every IT/OT system. Can we
trust the identity/ID (e.g., executable name) of the program? To avoid
detection, malware may disguise itself using the ID of a legitimate program,
and a system tool (e.g., PowerShell) used by the attackers may have the fake ID
of another common software, which is less sensitive. However, existing
intrusion detection techniques often overlook this critical program
reidentification problem (i.e., checking the program's identity). In this
paper, we propose an attentional heterogeneous graph neural network model
(DeepHGNN) to verify the program's identity based on its system behaviors. The
key idea is to leverage the representation learning of the heterogeneous
program behavior graph to guide the reidentification process. We formulate the
program reidentification as a graph classification problem and develop an
effective attentional heterogeneous graph embedding algorithm to solve it.
Extensive experiments --- using real-world enterprise monitoring data and real
attacks --- demonstrate the effectiveness of DeepHGNN across multiple popular
metrics and the robustness to the normal dynamic changes like program version
upgrades
Anomaly Detection and Correction in Large Labeled Bipartite Graphs
Binary classification problems can be naturally modeled as bipartite graphs,
where we attempt to classify right nodes based on their left adjacencies. We
consider the case of labeled bipartite graphs in which some labels and edges
are not trustworthy. Our goal is to reduce noise by identifying and fixing
these labels and edges.
We first propose a geometric technique for generating random graph instances
with untrustworthy labels and analyze the resulting graph properties. We focus
on generating graphs which reflect real-world data, where degree and label
frequencies follow power law distributions.
We review several algorithms for the problem of detection and correction,
proposing novel extensions and making observations specific to the bipartite
case. These algorithms range from math programming algorithms to discrete
combinatorial algorithms to Bayesian approximation algorithms to machine
learning algorithms.
We compare the performance of all these algorithms using several metrics and,
based on our observations, identify the relative strengths and weaknesses of
each individual algorithm.Comment: 36 pages, 4 figure
- …