1 research outputs found
SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation
Automatic literature review generation is one of the most challenging tasks
in natural language processing. Although large language models have tackled
literature review generation, the absence of large-scale datasets has been a
stumbling block to the progress. We release SciReviewGen, consisting of over
10,000 literature reviews and 690,000 papers cited in the reviews. Based on the
dataset, we evaluate recent transformer-based summarization models on the
literature review generation task, including Fusion-in-Decoder extended for
literature review generation. Human evaluation results show that some
machine-generated summaries are comparable to human-written reviews, while
revealing the challenges of automatic literature review generation such as
hallucinations and a lack of detailed information. Our dataset and code are
available at https://github.com/tetsu9923/SciReviewGen.Comment: ACL findings 2023 (to be appeared). arXiv admin note: text overlap
with arXiv:1810.04020 by other author