6 research outputs found
Graph Residual Flow for Molecular Graph Generation
Statistical generative models for molecular graphs attract attention from
many researchers from the fields of bio- and chemo-informatics. Among these
models, invertible flow-based approaches are not fully explored yet. In this
paper, we propose a powerful invertible flow for molecular graphs, called graph
residual flow (GRF). The GRF is based on residual flows, which are known for
more flexible and complex non-linear mappings than traditional coupling flows.
We theoretically derive non-trivial conditions such that GRF is invertible, and
present a way of keeping the entire flows invertible throughout the training
and sampling. Experimental results show that a generative model based on the
proposed GRF achieves comparable generation performance, with much smaller
number of trainable parameters compared to the existing flow-based model
Graph Neural Networks Exponentially Lose Expressive Power for Node Classification
Graph Neural Networks (graph NNs) are a promising deep learning approach for
analyzing graph-structured data. However, it is known that they do not improve
(or sometimes worsen) their predictive performance as we pile up many layers
and add non-lineality. To tackle this problem, we investigate the expressive
power of graph NNs via their asymptotic behaviors as the layer size tends to
infinity. Our strategy is to generalize the forward propagation of a Graph
Convolutional Network (GCN), which is a popular graph NN variant, as a specific
dynamical system. In the case of a GCN, we show that when its weights satisfy
the conditions determined by the spectra of the (augmented) normalized
Laplacian, its output exponentially approaches the set of signals that carry
information of the connected components and node degrees only for
distinguishing nodes. Our theory enables us to relate the expressive power of
GCNs with the topological information of the underlying graphs inherent in the
graph spectra. To demonstrate this, we characterize the asymptotic behavior of
GCNs on the Erd\H{o}s -- R\'{e}nyi graph. We show that when the Erd\H{o}s --
R\'{e}nyi graph is sufficiently dense and large, a broad range of GCNs on it
suffers from the "information loss" in the limit of infinite layers with high
probability. Based on the theory, we provide a principled guideline for weight
normalization of graph NNs. We experimentally confirm that the proposed weight
scaling enhances the predictive performance of GCNs in real data. Code is
available at https://github.com/delta2323/gnn-asymptotics.Comment: 9 pages, Supplemental material 28 pages. Accepted in International
Conference on Learning Representations (ICLR) 202
Understanding and Resolving Performance Degradation in Graph Convolutional Networks
A Graph Convolutional Network (GCN) stacks several layers and in each layer
performs a PROPagation operation (PROP) and a TRANsformation operation (TRAN)
for learning node representations over graph-structured data. Though powerful,
GCNs tend to suffer performance drop when the model gets deep. Previous works
focus on PROPs to study and mitigate this issue, but the role of TRANs is
barely investigated. In this work, we study performance degradation of GCNs by
experimentally examining how stacking only TRANs or PROPs works. We find that
TRANs contribute significantly, or even more than PROPs, to declining
performance, and moreover that they tend to amplify node-wise feature variance
in GCNs, causing variance inflamation that we identify as a key factor for
causing performance drop. Motivated by such observations, we propose a
variance-controlling technique termed Node Normalization (NodeNorm), which
scales each node's features using its own standard deviation. Experimental
results validate the effectiveness of NodeNorm on addressing performance
degradation of GCNs. Specifically, it enables deep GCNs to achieve comparable
results with shallow ones on 6 benchmark datasets, and to outperform shallow
ones in cases where deep models are needed. NodeNorm is a generic plug-in and
can well generalize to other GNN architectures.Comment: Code is available at <https://github.com/miafei/NodeNorm
Policy-GNN: Aggregation Optimization for Graph Neural Networks
Graph data are pervasive in many real-world applications. Recently,
increasing attention has been paid on graph neural networks (GNNs), which aim
to model the local graph structures and capture the hierarchical patterns by
aggregating the information from neighbors with stackable network modules.
Motivated by the observation that different nodes often require different
iterations of aggregation to fully capture the structural information, in this
paper, we propose to explicitly sample diverse iterations of aggregation for
different nodes to boost the performance of GNNs. It is a challenging task to
develop an effective aggregation strategy for each node, given complex graphs
and sparse features. Moreover, it is not straightforward to derive an efficient
algorithm since we need to feed the sampled nodes into different number of
network layers. To address the above challenges, we propose Policy-GNN, a
meta-policy framework that models the sampling procedure and message passing of
GNNs into a combined learning process. Specifically, Policy-GNN uses a
meta-policy to adaptively determine the number of aggregations for each node.
The meta-policy is trained with deep reinforcement learning (RL) by exploiting
the feedback from the model. We further introduce parameter sharing and a
buffer mechanism to boost the training efficiency. Experimental results on
three real-world benchmark datasets suggest that Policy-GNN significantly
outperforms the state-of-the-art alternatives, showing the promise in
aggregation optimization for GNNs.Comment: Accepted by ACM SIGKDD'20 research trac
Self-Supervised Graph Transformer on Large-Scale Molecular Data
How to obtain informative representations of molecules is a crucial
prerequisite in AI-driven drug design and discovery. Recent researches abstract
molecules as graphs and employ Graph Neural Networks (GNNs) for molecular
representation learning. Nevertheless, two issues impede the usage of GNNs in
real scenarios: (1) insufficient labeled molecules for supervised training; (2)
poor generalization capability to new-synthesized molecules. To address them
both, we propose a novel framework, GROVER, which stands for Graph
Representation frOm self-superVised mEssage passing tRansformer. With carefully
designed self-supervised tasks in node-, edge- and graph-level, GROVER can
learn rich structural and semantic information of molecules from enormous
unlabelled molecular data. Rather, to encode such complex information, GROVER
integrates Message Passing Networks into the Transformer-style architecture to
deliver a class of more expressive encoders of molecules. The flexibility of
GROVER allows it to be trained efficiently on large-scale molecular dataset
without requiring any supervision, thus being immunized to the two issues
mentioned above. We pre-train GROVER with 100 million parameters on 10 million
unlabelled molecules -- the biggest GNN and the largest training dataset in
molecular representation learning. We then leverage the pre-trained GROVER for
molecular property prediction followed by task-specific fine-tuning, where we
observe a huge improvement (more than 6% on average) from current
state-of-the-art methods on 11 challenging benchmarks. The insights we gained
are that well-designed self-supervision losses and largely-expressive
pre-trained models enjoy the significant potential on performance boosting.Comment: 17 pages, 7 figure
DeGNN: Characterizing and Improving Graph Neural Networks with Graph Decomposition
Despite the wide application of Graph Convolutional Network (GCN), one major
limitation is that it does not benefit from the increasing depth and suffers
from the oversmoothing problem. In this work, we first characterize this
phenomenon from the information-theoretic perspective and show that under
certain conditions, the mutual information between the output after layers
and the input of GCN converges to 0 exponentially with respect to . We also
show that, on the other hand, graph decomposition can potentially weaken the
condition of such convergence rate, which enabled our analysis for GraphCNN.
While different graph structures can only benefit from the corresponding
decomposition, in practice, we propose an automatic connectivity-aware graph
decomposition algorithm, DeGNN, to improve the performance of general graph
neural networks. Extensive experiments on widely adopted benchmark datasets
demonstrate that DeGNN can not only significantly boost the performance of
corresponding GNNs, but also achieves the state-of-the-art performances.Comment: 20 pages, 5 figures, 5 table