We present the Temporal Graph Benchmark (TGB), a collection of challenging
and diverse benchmark datasets for realistic, reproducible, and robust
evaluation of machine learning models on temporal graphs. TGB datasets are of
large scale, spanning years in duration, incorporate both node and edge-level
prediction tasks and cover a diverse set of domains including social, trade,
transaction, and transportation networks. For both tasks, we design evaluation
protocols based on realistic use-cases. We extensively benchmark each dataset
and find that the performance of common models can vary drastically across
datasets. In addition, on dynamic node property prediction tasks, we show that
simple methods often achieve superior performance compared to existing temporal
graph models. We believe that these findings open up opportunities for future
research on temporal graphs. Finally, TGB provides an automated machine
learning pipeline for reproducible and accessible temporal graph research,
including data loading, experiment setup and performance evaluation. TGB will
be maintained and updated on a regular basis and welcomes community feedback.
TGB datasets, data loaders, example codes, evaluation setup, and leaderboards
are publicly available at https://tgb.complexdatalab.com/.Comment: 20 pages, 7 figures, 7 tables, accepted at NeurIPS 2023 Datasets and
Benchmarks Trac