Multi-document summarization is challenging because the summaries should not
only describe the most important information from all documents but also
provide a coherent interpretation of the documents. This paper proposes a
method for multi-document summarization based on cluster similarity. In the
extractive method we use hybrid model based on a modified version of the
PageRank algorithm and a text correlation considerations mechanism. After
generating summaries by selecting the most important sentences from each
cluster, we apply BARTpho and ViT5 to construct the abstractive models. Both
extractive and abstractive approaches were considered in this study. The
proposed method achieves competitive results in VLSP 2022 competition.Comment: In Proceedings of the 9th International Workshop on Vietnamese
Language and Speech Processing (VLSP 2022