Extractive summarization is a crucial task in natural language processing
that aims to condense long documents into shorter versions by directly
extracting sentences. The recent introduction of ChatGPT has attracted
significant interest in the NLP community due to its remarkable performance on
a wide range of downstream tasks. However, concerns regarding factuality and
faithfulness have hindered its practical applications for summarization
systems. This paper first presents a thorough evaluation of ChatGPT's
performance on extractive summarization and compares it with traditional
fine-tuning methods on various benchmark datasets. Our experimental analysis
reveals that ChatGPT's extractive summarization performance is still inferior
to existing supervised systems in terms of ROUGE scores. In addition, we
explore the effectiveness of in-context learning and chain-of-thought reasoning
for enhancing its performance. Furthermore, we find that applying an
extract-then-generate pipeline with ChatGPT yields significant performance
improvements over abstractive baselines in terms of summary faithfulness. These
observations highlight potential directions for enhancing ChatGPT's
capabilities for faithful text summarization tasks using two-stage approaches.Comment: Work in progres