Search CORE

21,082 research outputs found

Benchmarking Graph Neural Networks

Author: Bengio Yoshua
Bresson Xavier
Dwivedi Vijay Prakash
Joshi Chaitanya K.
Laurent Thomas
Publication venue
Publication date: 03/07/2020
Field of study

Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. As the field grows, it becomes critical to identify key architectures and validate new ideas that generalize to larger, more complex datasets. Unfortunately, it has been increasingly difficult to gauge the effectiveness of new models in the absence of a standardized benchmark with consistent experimental settings. In this paper, we introduce a reproducible GNN benchmarking framework, with the facility for researchers to add new models conveniently for arbitrary datasets. We demonstrate the usefulness of our framework by presenting a principled investigation into the recent Weisfeiler-Lehman GNNs (WL-GNNs) compared to message passing-based graph convolutional networks (GCNs) for a variety of graph tasks, i.e. graph regression/classification and node/link prediction, with medium-scale datasets.Comment: Benchmarking framework on GitHub at https://github.com/graphdeeplearning/benchmarking-gnn

arXiv.org e-Print Archive

Graph Generative Model for Benchmarking Graph Neural Networks

Author: Palowitch John
Perozzi Bryan
Salakhutdinov Ruslan
Wu Yue
Yoon Minji
Publication venue
Publication date: 09/06/2023
Field of study

As the field of Graph Neural Networks (GNN) continues to grow, it experiences a corresponding increase in the need for large, real-world datasets to train and test new GNN models on challenging, realistic problems. Unfortunately, such graph datasets are often generated from online, highly privacy-restricted ecosystems, which makes research and development on these datasets hard, if not impossible. This greatly reduces the amount of benchmark graphs available to researchers, causing the field to rely only on a handful of publicly-available datasets. To address this problem, we introduce a novel graph generative model, Computation Graph Transformer (CGT) that learns and reproduces the distribution of real-world graphs in a privacy-controlled way. More specifically, CGT (1) generates effective benchmark graphs on which GNNs show similar task performance as on the source graphs, (2) scales to process large-scale graphs, (3) incorporates off-the-shelf privacy modules to guarantee end-user privacy of the generated graph. Extensive experiments across a vast body of graph generative models show that only our model can successfully generate privacy-controlled, synthetic substitutes of large-scale real-world graphs that can be effectively used to benchmark GNN models

arXiv.org e-Print Archive

IPC: A Benchmark Data Set for Learning with Graph-Structured Data

Author: Chen Jie
Ferber Patrick
Huo Siyu
Katz Michael
Ma Tengfei
Publication venue
Publication date: 01/01/2019
Field of study

Benchmark data sets are an indispensable ingredient of the evaluation of graph-based machine learning methods. We release a new data set, compiled from International Planning Competitions (IPC), for benchmarking graph classification, regression, and related tasks. Apart from the graph construction (based on AI planning problems) that is interesting in its own right, the data set possesses distinctly different characteristics from popularly used benchmarks. The data set, named IPC, consists of two self-contained versions, grounded and lifted, both including graphs of large and skewedly distributed sizes, posing substantial challenges for the computation of graph models such as graph kernels and graph neural networks. The graphs in this data set are directed and the lifted version is acyclic, offering the opportunity of benchmarking specialized models for directed (acyclic) structures. Moreover, the graph generator and the labeling are computer programmed; thus, the data set may be extended easily if a larger scale is desired. The data set is accessible from \url{https://github.com/IBM/IPC-graph-data}.Comment: ICML 2019 Workshop on Learning and Reasoning with Graph-Structured Data. The data set is accessible from https://github.com/IBM/IPC-graph-dat

arXiv.org e-Print Archive

edoc

IPC: A Benchmark Data Set for Learning with Graph-Structured Data

Author: Chen Jie
Ferber Patrick
Huo Siyu
Katz Michael
Ma Tengfei
Publication venue: AAAI Press
Publication date: 01/01/2019
Field of study

Benchmark data sets are an indispensable ingredient of the evaluation of graph-based machine learning methods. We release a new data set, compiled from International Planning Competitions (IPC), for benchmarking graph classification, regression, and related tasks. Apart fromthe graph construction (based on AI planning problems) that is interesting in its own right, the data set possesses distinctly different characteristics from popularly used benchmarks. The dataset, named IPC, consists of two self-contained versions, grounded and lifted, both including graphs of large and skewedly distributed sizes,posing substantial challenges for the computation of graph models such as graph kernels and graph neural networks. The graphs in this data set are directed and the lifted version is acyclic, offering the opportunity of benchmarking specialized models for directed (acyclic) structures. Moreover, the graph generator and the labelingare computer programmed; thus, the data set may be extended easily if a larger scale is desired

edoc

Using Regular Languages to Explore the Representational Capacity of Recurrent Neural Architectures

Author: AS Reber
AW Smith
B Yoshua
G Jager
I Simon
J Rogers
JL Elman
M Casey
MP Marcus
N Chomsky
N Chomsky
S Hochreiter
WT Fitch
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

The presence of Long Distance Dependencies (LDDs) in sequential data poses significant challenges for computational models. Various recurrent neural architectures have been designed to mitigate this issue. In order to test these state-of-the-art architectures, there is growing need for rich benchmarking datasets. However, one of the drawbacks of existing datasets is the lack of experimental control with regards to the presence and/or degree of LDDs. This lack of control limits the analysis of model performance in relation to the specific challenge posed by LDDs. One way to address this is to use synthetic data having the properties of subregular languages. The degree of LDDs within the generated data can be controlled through the k parameter, length of the generated strings, and by choosing appropriate forbidden strings. In this paper, we explore the capacity of different RNN extensions to model LDDs, by evaluating these models on a sequence of SPk synthesized datasets, where each subsequent dataset exhibits a longer degree of LDD. Even though SPk are simple languages, the presence of LDDs does have significant impact on the performance of recurrent neural architectures, thus making them prime candidate in benchmarking tasks.Comment: International Conference of Artificial Neural Networks (ICANN) 201

arXiv.org e-Print Archive

Crossref

Arrow@TUDublin