Despite the recent progress in language generation models, their outputs may
not always meet user expectations. In this work, we study whether informational
feedback in natural language can be leveraged to improve generation quality and
user preference alignment. To this end, we consider factual consistency in
summarization, the quality that the summary should only contain information
supported by the input documents, as the user-expected preference. We collect a
high-quality dataset, DeFacto, containing human demonstrations and
informational natural language feedback consisting of corrective instructions,
edited summaries, and explanations with respect to the factual consistency of
the summary. Using our dataset, we study three natural language generation
tasks: (1) editing a summary by following the human feedback, (2) generating
human feedback for editing the original summary, and (3) revising the initial
summary to correct factual errors by generating both the human feedback and
edited summary. We show that DeFacto can provide factually consistent
human-edited summaries and further insights into summarization factual
consistency thanks to its informational natural language feedback. We further
demonstrate that fine-tuned language models can leverage our dataset to improve
the summary factual consistency, while large language models lack the zero-shot
learning ability in our proposed tasks that require controllable text
generation.Comment: ACL 2023 Camera Ready, GitHub Repo:
https://github.com/microsoft/DeFact