1,773 research outputs found
Summarization of Films and Documentaries Based on Subtitles and Scripts
We assess the performance of generic text summarization algorithms applied to
films and documentaries, using the well-known behavior of summarization of news
articles as reference. We use three datasets: (i) news articles, (ii) film
scripts and subtitles, and (iii) documentary subtitles. Standard ROUGE metrics
are used for comparing generated summaries against news abstracts, plot
summaries, and synopses. We show that the best performing algorithms are LSA,
for news articles and documentaries, and LexRank and Support Sets, for films.
Despite the different nature of films and documentaries, their relative
behavior is in accordance with that obtained for news articles.Comment: 7 pages, 9 tables, 4 figures, submitted to Pattern Recognition
Letters (Elsevier
A Novel ILP Framework for Summarizing Content with High Lexical Variety
Summarizing content contributed by individuals can be challenging, because
people make different lexical choices even when describing the same events.
However, there remains a significant need to summarize such content. Examples
include the student responses to post-class reflective questions, product
reviews, and news articles published by different news agencies related to the
same events. High lexical diversity of these documents hinders the system's
ability to effectively identify salient content and reduce summary redundancy.
In this paper, we overcome this issue by introducing an integer linear
programming-based summarization framework. It incorporates a low-rank
approximation to the sentence-word co-occurrence matrix to intrinsically group
semantically-similar lexical items. We conduct extensive experiments on
datasets of student responses, product reviews, and news documents. Our
approach compares favorably to a number of extractive baselines as well as a
neural abstractive summarization system. The paper finally sheds light on when
and why the proposed framework is effective at summarizing content with high
lexical variety.Comment: Accepted for publication in the journal of Natural Language
Engineering, 201
ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks
Scientific article summarization is challenging: large, annotated corpora are
not available, and the summary should ideally include the article's impacts on
research community. This paper provides novel solutions to these two
challenges. We 1) develop and release the first large-scale manually-annotated
corpus for scientific papers (on computational linguistics) by enabling faster
annotation, and 2) propose summarization methods that integrate the authors'
original highlights (abstract) and the article's actual impacts on the
community (citations), to create comprehensive, hybrid summaries. We conduct
experiments to demonstrate the efficacy of our corpus in training data-driven
models for scientific paper summarization and the advantage of our hybrid
summaries over abstracts and traditional citation-based summaries. Our large
annotated corpus and hybrid methods provide a new framework for scientific
paper summarization research.Comment: AAAI 201
Responsible AI Considerations in Text Summarization Research: A Review of Current Practices
AI and NLP publication venues have increasingly encouraged researchers to
reflect on possible ethical considerations, adverse impacts, and other
responsible AI issues their work might engender. However, for specific NLP
tasks our understanding of how prevalent such issues are, or when and why these
issues are likely to arise, remains limited. Focusing on text summarization --
a common NLP task largely overlooked by the responsible AI community -- we
examine research and reporting practices in the current literature. We conduct
a multi-round qualitative analysis of 333 summarization papers from the ACL
Anthology published between 2020-2022. We focus on how, which, and when
responsible AI issues are covered, which relevant stakeholders are considered,
and mismatches between stated and realized research goals. We also discuss
current evaluation practices and consider how authors discuss the limitations
of both prior work and their own work. Overall, we find that relatively few
papers engage with possible stakeholders or contexts of use, which limits their
consideration of potential downstream adverse impacts or other responsible AI
issues. Based on our findings, we make recommendations on concrete practices
and research directions
- …