3,289 research outputs found
Semantic argument frequency-based Multi-Document Summarization
Semantic Role Labeling (SRL) aims to identify the constituents of a sentence, together with their roles with respect to the sentence predicates. In this paper, we introduce and assess the idea of using SRL on generic Multi-Document Summarization (MDS). We score sentences according to their inclusion of frequent semantic phrases and form the summary using the top-scored sentences. We compare this method with a term-based sentence scoring approach to investigate the effects of using semantic units instead of single words for sentence scoring. We also integrate our scoring metric as an auxiliary feature to a cutting edge summarizer with the intention of examining its effects on the performance. The experiments using datasets from the Document Understanding Conference (DUC) 2004 show that the SRL-based summarization outperforms the term-based approach as well as most of the DUC participants. © 2009 IEEE
Summarizing Dialogic Arguments from Social Media
Online argumentative dialog is a rich source of information on popular
beliefs and opinions that could be useful to companies as well as governmental
or public policy agencies. Compact, easy to read, summaries of these dialogues
would thus be highly valuable. A priori, it is not even clear what form such a
summary should take. Previous work on summarization has primarily focused on
summarizing written texts, where the notion of an abstract of the text is well
defined. We collect gold standard training data consisting of five human
summaries for each of 161 dialogues on the topics of Gay Marriage, Gun Control
and Abortion. We present several different computational models aimed at
identifying segments of the dialogues whose content should be used for the
summary, using linguistic features and Word2vec features with both SVMs and
Bidirectional LSTMs. We show that we can identify the most important arguments
by using the dialog context with a best F-measure of 0.74 for gun control, 0.71
for gay marriage, and 0.67 for abortion.Comment: Proceedings of the 21th Workshop on the Semantics and Pragmatics of
Dialogue (SemDial 2017
Foreground and background text in retrieval
Our hypothesis is that certain clauses have foreground functions in text,
while other clauses have background functions and that these functions are
expressed or reflected in the syntactic structure of the clause.
Presumably these clauses will have differing utility for automatic
approaches to text understanding; a summarization system might want to
utilize background clauses to capture commonalities between numbers of
documents while an indexing system might use foreground clauses in order to
capture specific characteristics of a certain document
Generating indicative-informative summaries with SumUM
We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the topics of the document, and the informative part elaborates on some of these topics according to the reader's interest. SumUM motivates the topics, describes entities, and defines concepts. It is a first step for exploring the issue of dynamic summarization. This is accomplished through a process of shallow syntactic and semantic analysis, concept identification, and text regeneration. Our method was developed through the study of a corpus of abstracts written by professional abstractors. Relying on human judgment, we have evaluated indicativeness, informativeness, and text acceptability of the automatic summaries. The results thus far indicate good performance when compared with other summarization technologies
Some Reflections on the Task of Content Determination in the Context of Multi-Document Summarization of Evolving Events
Despite its importance, the task of summarizing evolving events has received
small attention by researchers in the field of multi-document summariztion. In
a previous paper (Afantenos et al. 2007) we have presented a methodology for
the automatic summarization of documents, emitted by multiple sources, which
describe the evolution of an event. At the heart of this methodology lies the
identification of similarities and differences between the various documents,
in two axes: the synchronic and the diachronic. This is achieved by the
introduction of the notion of Synchronic and Diachronic Relations. Those
relations connect the messages that are found in the documents, resulting thus
in a graph which we call grid. Although the creation of the grid completes the
Document Planning phase of a typical NLG architecture, it can be the case that
the number of messages contained in a grid is very large, exceeding thus the
required compression rate. In this paper we provide some initial thoughts on a
probabilistic model which can be applied at the Content Determination stage,
and which tries to alleviate this problem.Comment: 5 pages, 2 figure
- …