News articles such as sports game reports
are often thought to closely follow the underlying game statistics, but in practice
they contain a notable amount of background knowledge, interpretation, insight
into the game, and quotes that are not
present in the official statistics. This
poses a challenge for automated data-totext news generation with real-world news
corpora as training data. We report on
the development of a corpus of Finnish
ice hockey news, edited to be suitable
for training of end-to-end news generation
methods, as well as demonstrate generation of text, which was judged by journalists to be relatively close to a viable product. The new dataset and system source
code are available for research purposes.</p