Synthetic Time Series Generation (TSG) is crucial in a range of applications,
including data augmentation, anomaly detection, and privacy preservation.
Although significant strides have been made in this field, existing methods
exhibit three key limitations: (1) They often benchmark against similar model
types, constraining a holistic view of performance capabilities. (2) The use of
specialized synthetic and private datasets introduces biases and hampers
generalizability. (3) Ambiguous evaluation measures, often tied to custom
networks or downstream tasks, hinder consistent and fair comparison.
To overcome these limitations, we introduce \textsf{TSGBench}, the inaugural
Time Series Generation Benchmark, designed for a unified and comprehensive
assessment of TSG methods. It comprises three modules: (1) a curated collection
of publicly available, real-world datasets tailored for TSG, together with a
standardized preprocessing pipeline; (2) a comprehensive evaluation measures
suite including vanilla measures, new distance-based assessments, and
visualization tools; (3) a pioneering generalization test rooted in Domain
Adaptation (DA), compatible with all methods. We have conducted comprehensive
experiments using \textsf{TSGBench} across a spectrum of ten real-world
datasets from diverse domains, utilizing ten advanced TSG methods and twelve
evaluation measures. The results highlight the reliability and efficacy of
\textsf{TSGBench} in evaluating TSG methods. Crucially, \textsf{TSGBench}
delivers a statistical analysis of the performance rankings of these methods,
illuminating their varying performance across different datasets and measures
and offering nuanced insights into the effectiveness of each method.Comment: Accepted and to appear in VLDB 202