Selecting the ``right'' amount of information to include in a summary is a
difficult task. A good summary should be detailed and entity-centric without
being overly dense and hard to follow. To better understand this tradeoff, we
solicit increasingly dense GPT-4 summaries with what we refer to as a ``Chain
of Density'' (CoD) prompt. Specifically, GPT-4 generates an initial
entity-sparse summary before iteratively incorporating missing salient entities
without increasing the length. Summaries generated by CoD are more abstractive,
exhibit more fusion, and have less of a lead bias than GPT-4 summaries
generated by a vanilla prompt. We conduct a human preference study on 100 CNN
DailyMail articles and find that that humans prefer GPT-4 summaries that are
more dense than those generated by a vanilla prompt and almost as dense as
human written summaries. Qualitative analysis supports the notion that there
exists a tradeoff between informativeness and readability. 500 annotated CoD
summaries, as well as an extra 5,000 unannotated summaries, are freely
available on HuggingFace
(https://huggingface.co/datasets/griffin/chain_of_density).Comment: preprin