2,542 research outputs found
Multiple aspect summarization using integer linear programming
Multi-document summarization involves many aspects of content selection and sur-face realization. The summaries must be informative, succinct, grammatical, and obey stylistic writing conventions. We present a method where such individual aspects are learned separately from data (without any hand-engineering) but optimized jointly using an integer linear programme. The ILP framework allows us to combine the decisions of the expert learners and to select and rewrite source content through a mixture of objective setting, soft and hard constraints. Experimental results on the TAC-08 data set show that our model achieves state-of-the-art performance using ROUGE and signifi-cantly improves the informativeness of the summaries.
Generating Aspect-oriented Multi-document Summarization with Event-Aspect Model
In this paper, we propose a novel approach to automatic generation of aspect-oriented summaries from multiple documents. We first develop an event-aspect LDA model to cluster sentences into aspects. We then use extended LexRank algorithm to rank the sentences in each cluster. We use Integer Linear Programming for sentence selection. Key features of our method include automatic grouping of semantically related sentences and sentence ranking based on extension of random walk model. Also, we implement a new sentence compression algorithm which use dependency tree instead of parser tree. We compare our method with four baseline methods. Quantitative evaluation based on Rouge metric demonstrates the effectiveness and advantages of our method.
TGSum: Build Tweet Guided Multi-Document Summarization Dataset
The development of summarization research has been significantly hampered by
the costly acquisition of reference summaries. This paper proposes an effective
way to automatically collect large scales of news-related multi-document
summaries with reference to social media's reactions. We utilize two types of
social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to
cluster documents into different topic sets. Also, a tweet with a hyper-link
often highlights certain key points of the corresponding document. We
synthesize a linked document cluster to form a reference summary which can
cover most key points. To this aim, we adopt the ROUGE metrics to measure the
coverage ratio, and develop an Integer Linear Programming solution to discover
the sentence set reaching the upper bound of ROUGE. Since we allow summary
sentences to be selected from both documents and high-quality tweets, the
generated reference summaries could be abstractive. Both informativeness and
readability of the collected summaries are verified by manual judgment. In
addition, we train a Support Vector Regression summarizer on DUC generic
multi-document summarization benchmarks. With the collected data as extra
training resource, the performance of the summarizer improves a lot on all the
test sets. We release this dataset for further research.Comment: 7 pages, 1 figure in AAAI 201
Generating Abstractive Summaries from Meeting Transcripts
Summaries of meetings are very important as they convey the essential content
of discussions in a concise form. Generally, it is time consuming to read and
understand the whole documents. Therefore, summaries play an important role as
the readers are interested in only the important context of discussions. In
this work, we address the task of meeting document summarization. Automatic
summarization systems on meeting conversations developed so far have been
primarily extractive, resulting in unacceptable summaries that are hard to
read. The extracted utterances contain disfluencies that affect the quality of
the extractive summaries. To make summaries much more readable, we propose an
approach to generating abstractive summaries by fusing important content from
several utterances. We first separate meeting transcripts into various topic
segments, and then identify the important utterances in each segment using a
supervised learning approach. The important utterances are then combined
together to generate a one-sentence summary. In the text generation step, the
dependency parses of the utterances in each segment are combined together to
create a directed graph. The most informative and well-formed sub-graph
obtained by integer linear programming (ILP) is selected to generate a
one-sentence summary for each topic segment. The ILP formulation reduces
disfluencies by leveraging grammatical relations that are more prominent in
non-conversational style of text, and therefore generates summaries that is
comparable to human-written abstractive summaries. Experimental results show
that our method can generate more informative summaries than the baselines. In
addition, readability assessments by human judges as well as log-likelihood
estimates obtained from the dependency parser show that our generated summaries
are significantly readable and well-formed.Comment: 10 pages, Proceedings of the 2015 ACM Symposium on Document
Engineering, DocEng' 201
Abstractive Multi-Document Summarization via Phrase Selection and Merging
We propose an abstraction-based multi-document summarization framework that
can construct new sentences by exploring more fine-grained syntactic units than
sentences, namely, noun/verb phrases. Different from existing abstraction-based
approaches, our method first constructs a pool of concepts and facts
represented by phrases from the input documents. Then new sentences are
generated by selecting and merging informative phrases to maximize the salience
of phrases and meanwhile satisfy the sentence construction constraints. We
employ integer linear optimization for conducting phrase selection and merging
simultaneously in order to achieve the global optimal solution for a summary.
Experimental results on the benchmark data set TAC 2011 show that our framework
outperforms the state-of-the-art models under automated pyramid evaluation
metric, and achieves reasonably well results on manual linguistic quality
evaluation.Comment: 11 pages, 1 figure, accepted as a full paper at ACL 201
Abstract Meaning Representation for Multi-Document Summarization
Generating an abstract from a collection of documents is a desirable
capability for many real-world applications. However, abstractive approaches to
multi-document summarization have not been thoroughly investigated. This paper
studies the feasibility of using Abstract Meaning Representation (AMR), a
semantic representation of natural language grounded in linguistic theory, as a
form of content representation. Our approach condenses source documents to a
set of summary graphs following the AMR formalism. The summary graphs are then
transformed to a set of summary sentences in a surface realization step. The
framework is fully data-driven and flexible. Each component can be optimized
independently using small-scale, in-domain training data. We perform
experiments on benchmark summarization datasets and report promising results.
We also describe opportunities and challenges for advancing this line of
research.Comment: 13 page
- ā¦