We present a state-of-the-art report on visualization corpora in automated
chart analysis research. We survey 56 papers that created or used a
visualization corpus as the input of their research techniques or systems.
Based on a multi-level task taxonomy that identifies the goal, method, and
outputs of automated chart analysis, we examine the property space of existing
chart corpora along five dimensions: format, scope, collection method,
annotations, and diversity. Through the survey, we summarize common patterns
and practices of creating chart corpora, identify research gaps and
opportunities, and discuss the desired properties of future benchmark corpora
and the required tools to create them.Comment: To appear at EuroVis 202