Recent neural headline generation models have shown great results, but are
generally trained on very large datasets. We focus our efforts on improving
headline quality on smaller datasets by the means of pretraining. We propose
new methods that enable pre-training all the parameters of the model and
utilize all available text, resulting in improvements by up to 32.4% relative
in perplexity and 2.84 points in ROUGE.Comment: Accepted to EMNLP 2017 Workshop on New Frontiers in Summarizatio