7 research outputs found
Contextual Graph Attention for Answering Logical Queries over Incomplete Knowledge Graphs
Recently, several studies have explored methods for using KG embedding to
answer logical queries. These approaches either treat embedding learning and
query answering as two separated learning tasks, or fail to deal with the
variability of contributions from different query paths. We proposed to
leverage a graph attention mechanism to handle the unequal contribution of
different query paths. However, commonly used graph attention assumes that the
center node embedding is provided, which is unavailable in this task since the
center node is to be predicted. To solve this problem we propose a multi-head
attention-based end-to-end logical query answering model, called Contextual
Graph Attention model(CGA), which uses an initial neighborhood aggregation
layer to generate the center embedding, and the whole model is trained jointly
on the original KG structure as well as the sampled query-answer pairs. We also
introduce two new datasets, DB18 and WikiGeo19, which are rather large in size
compared to the existing datasets and contain many more relation types, and use
them to evaluate the performance of the proposed model. Our result shows that
the proposed CGA with fewer learnable parameters consistently outperforms the
baseline models on both datasets as well as Bio dataset.Comment: 8 pages, 3 figures, camera ready version of article accepted to K-CAP
2019, Marina del Rey, California, United State
SCE: Scalable Network Embedding from Sparsest Cut
Large-scale network embedding is to learn a latent representation for each
node in an unsupervised manner, which captures inherent properties and
structural information of the underlying graph. In this field, many popular
approaches are influenced by the skip-gram model from natural language
processing. Most of them use a contrastive objective to train an encoder which
forces the embeddings of similar pairs to be close and embeddings of negative
samples to be far. A key of success to such contrastive learning methods is how
to draw positive and negative samples. While negative samples that are
generated by straightforward random sampling are often satisfying, methods for
drawing positive examples remains a hot topic.
In this paper, we propose SCE for unsupervised network embedding only using
negative samples for training. Our method is based on a new contrastive
objective inspired by the well-known sparsest cut problem. To solve the
underlying optimization problem, we introduce a Laplacian smoothing trick,
which uses graph convolutional operators as low-pass filters for smoothing node
representations. The resulting model consists of a GCN-type structure as the
encoder and a simple loss function. Notably, our model does not use positive
samples but only negative samples for training, which not only makes the
implementation and tuning much easier, but also reduces the training time
significantly.
Finally, extensive experimental studies on real world data sets are
conducted. The results clearly demonstrate the advantages of our new model in
both accuracy and scalability compared to strong baselines such as GraphSAGE,
G2G and DGI.Comment: KDD 202
Understanding Negative Sampling in Graph Representation Learning
Graph representation learning has been extensively studied in recent years.
Despite its potential in generating continuous embeddings for various networks,
both the effectiveness and efficiency to infer high-quality representations
toward large corpus of nodes are still challenging. Sampling is a critical
point to achieve the performance goals. Prior arts usually focus on sampling
positive node pairs, while the strategy for negative sampling is left
insufficiently explored. To bridge the gap, we systematically analyze the role
of negative sampling from the perspectives of both objective and risk,
theoretically demonstrating that negative sampling is as important as positive
sampling in determining the optimization objective and the resulted variance.
To the best of our knowledge, we are the first to derive the theory and
quantify that the negative sampling distribution should be positively but
sub-linearly correlated to their positive sampling distribution. With the
guidance of the theory, we propose MCNS, approximating the positive
distribution with self-contrast approximation and accelerating negative
sampling by Metropolis-Hastings. We evaluate our method on 5 datasets that
cover extensive downstream graph learning tasks, including link prediction,
node classification and personalized recommendation, on a total of 19
experimental settings. These relatively comprehensive experimental results
demonstrate its robustness and superiorities.Comment: KDD 202
XGNN: Towards Model-Level Explanations of Graph Neural Networks
Graphs neural networks (GNNs) learn node features by aggregating and
combining neighbor information, which have achieved promising performance on
many graph tasks. However, GNNs are mostly treated as black-boxes and lack
human intelligible explanations. Thus, they cannot be fully trusted and used in
certain application domains if GNN models cannot be explained. In this work, we
propose a novel approach, known as XGNN, to interpret GNNs at the model-level.
Our approach can provide high-level insights and generic understanding of how
GNNs work. In particular, we propose to explain GNNs by training a graph
generator so that the generated graph patterns maximize a certain prediction of
the model.We formulate the graph generation as a reinforcement learning task,
where for each step, the graph generator predicts how to add an edge into the
current graph. The graph generator is trained via a policy gradient method
based on information from the trained GNNs. In addition, we incorporate several
graph rules to encourage the generated graphs to be valid. Experimental results
on both synthetic and real-world datasets show that our proposed methods help
understand and verify the trained GNNs. Furthermore, our experimental results
indicate that the generated graphs can provide guidance on how to improve the
trained GNNs
Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models
In this paper, we propose a flexible notion of characteristic functions
defined on graph vertices to describe the distribution of vertex features at
multiple scales. We introduce FEATHER, a computationally efficient algorithm to
calculate a specific variant of these characteristic functions where the
probability weights of the characteristic function are defined as the
transition probabilities of random walks. We argue that features extracted by
this procedure are useful for node level machine learning tasks. We discuss the
pooling of these node representations, resulting in compact descriptors of
graphs that can serve as features for graph classification algorithms. We
analytically prove that FEATHER describes isomorphic graphs with the same
representation and exhibits robustness to data corruption. Using the node
feature characteristic functions we define parametric models where evaluation
points of the functions are learned parameters of supervised classifiers.
Experiments on real world large datasets show that our proposed algorithm
creates high quality representations, performs transfer learning efficiently,
exhibits robustness to hyperparameter changes, and scales linearly with the
input size.Comment: Source code is available at:
https://github.com/benedekrozemberczki/FEATHE
GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training
Graph representation learning has emerged as a powerful technique for
addressing real-world problems. Various downstream graph learning tasks have
benefited from its recent developments, such as node classification, similarity
search, and graph classification. However, prior arts on graph representation
learning focus on domain specific problems and train a dedicated model for each
graph dataset, which is usually non-transferable to out-of-domain data.
Inspired by the recent advances in pre-training from natural language
processing and computer vision, we design Graph Contrastive Coding (GCC) -- a
self-supervised graph neural network pre-training framework -- to capture the
universal network topological properties across multiple networks. We design
GCC's pre-training task as subgraph instance discrimination in and across
networks and leverage contrastive learning to empower graph neural networks to
learn the intrinsic and transferable structural representations. We conduct
extensive experiments on three graph learning tasks and ten graph datasets. The
results show that GCC pre-trained on a collection of diverse datasets can
achieve competitive or better performance to its task-specific and
trained-from-scratch counterparts. This suggests that the pre-training and
fine-tuning paradigm presents great potential for graph representation
learning.Comment: 11 pages, 5 figures, to appear in KDD 2020 proceeding
Towards Deeper Graph Neural Networks
Graph neural networks have shown significant success in the field of graph
representation learning. Graph convolutions perform neighborhood aggregation
and represent one of the most important graph operations. Nevertheless, one
layer of these neighborhood aggregation methods only consider immediate
neighbors, and the performance decreases when going deeper to enable larger
receptive fields. Several recent studies attribute this performance
deterioration to the over-smoothing issue, which states that repeated
propagation makes node representations of different classes indistinguishable.
In this work, we study this observation systematically and develop new insights
towards deeper graph neural networks. First, we provide a systematical analysis
on this issue and argue that the key factor compromising the performance
significantly is the entanglement of representation transformation and
propagation in current graph convolution operations. After decoupling these two
operations, deeper graph neural networks can be used to learn graph node
representations from larger receptive fields. We further provide a theoretical
analysis of the above observation when building very deep models, which can
serve as a rigorous and gentle description of the over-smoothing issue. Based
on our theoretical and empirical analysis, we propose Deep Adaptive Graph
Neural Network (DAGNN) to adaptively incorporate information from large
receptive fields. A set of experiments on citation, co-authorship, and
co-purchase datasets have confirmed our analysis and insights and demonstrated
the superiority of our proposed methods.Comment: 11 pages, KDD202