When directly using existing text generation datasets for controllable
generation, we are facing the problem of not having the domain knowledge and
thus the aspects that could be controlled are limited. A typical example is
when using CNN/Daily Mail dataset for controllable text summarization, there is
no guided information on the emphasis of summary sentences. A more useful text
generator should leverage both the input text and the control signal to guide
the generation, which can only be built with a deep understanding of the domain
knowledge. Motivated by this vision, our paper introduces a new text generation
dataset, named MReD. Our new dataset consists of 7,089 meta-reviews and all its
45k meta-review sentences are manually annotated with one of the 9 carefully
defined categories, including abstract, strength, decision, etc. We present
experimental results on start-of-the-art summarization models, and propose
methods for structure-controlled generation with both extractive and
abstractive models using our annotated data. By exploring various settings and
analyzing the model behavior with respect to the control signal, we demonstrate
the challenges of our proposed task and the values of our dataset MReD.
Meanwhile, MReD also allows us to have a better understanding of the
meta-review domain.Comment: 15 pages, 5 figures, accepted at ACL 202