Search CORE

42,144 research outputs found

TGSum: Build Tweet Guided Multi-Document Summarization Dataset

Author: Cao Ziqiang
Chen Chengyao
Li Sujian
Li Wenjie
Wei Furu
Zhou Ming
Publication venue
Publication date: 26/11/2015
Field of study

The development of summarization research has been significantly hampered by the costly acquisition of reference summaries. This paper proposes an effective way to automatically collect large scales of news-related multi-document summaries with reference to social media's reactions. We utilize two types of social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to cluster documents into different topic sets. Also, a tweet with a hyper-link often highlights certain key points of the corresponding document. We synthesize a linked document cluster to form a reference summary which can cover most key points. To this aim, we adopt the ROUGE metrics to measure the coverage ratio, and develop an Integer Linear Programming solution to discover the sentence set reaching the upper bound of ROUGE. Since we allow summary sentences to be selected from both documents and high-quality tweets, the generated reference summaries could be abstractive. Both informativeness and readability of the collected summaries are verified by manual judgment. In addition, we train a Support Vector Regression summarizer on DUC generic multi-document summarization benchmarks. With the collected data as extra training resource, the performance of the summarizer improves a lot on all the test sets. We release this dataset for further research.Comment: 7 pages, 1 figure in AAAI 201

arXiv.org e-Print Archive

CiteSeerX

The Hong Kong Polytechnic University Pao Yue-kong Library

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Supervised Approach to Extractive Summarisation of Scientific Papers

Author: Augenstein Isabelle
Collins Ed
Riedel Sebastian
Publication venue
Publication date: 01/01/2017
Field of study

Automatic summarisation is a popular approach to reduce a document to its main arguments. Recent research in the area has focused on neural approaches to summarisation, which can be very data-hungry. However, few large datasets exist and none for the traditionally popular domain of scientific publications, which opens up challenging research avenues centered on encoding large, complex documents. In this paper, we introduce a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and show straightforward ways of extending it further. We develop models on the dataset making use of both neural sentence encoding and traditionally used summarisation features and show that models which encode sentences as well as their local and global context perform best, significantly outperforming well-established baseline methods.Comment: 11 pages, 6 figure

arXiv.org e-Print Archive

Crossref

UCL Discovery