Search CORE

1,773 research outputs found

Summarization of Films and Documentaries Based on Subtitles and Scripts

Author: Aparício Marta
de Matos David Martins
Figueiredo Paulo
Marujo Luís
Raposo Francisco
Ribeiro Ricardo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

We assess the performance of generic text summarization algorithms applied to films and documentaries, using the well-known behavior of summarization of news articles as reference. We use three datasets: (i) news articles, (ii) film scripts and subtitles, and (iii) documentary subtitles. Standard ROUGE metrics are used for comparing generated summaries against news abstracts, plot summaries, and synopses. We show that the best performing algorithms are LSA, for news articles and documentaries, and LexRank and Support Sets, for films. Despite the different nature of films and documentaries, their relative behavior is in accordance with that obtained for news articles.Comment: 7 pages, 9 tables, 4 figures, submitted to Pattern Recognition Letters (Elsevier

arXiv.org e-Print Archive

Repositório Institucional do ISCTE-IUL

A Novel ILP Framework for Summarizing Content with High Lexical Variety

Author: Almeida
Celikyilmaz
Conroy
DIANE LITMAN
FEI LIU
Goodfellow
Li
Li
Luo
Luo
Luo
Martins
Mazumder
Mosteller
Narayan
Qian
Ren
Tarnpradab
Wang
WENCAN LUO
Wilson
Xiong
ZITAO LIU
Publication venue
Publication date: 25/07/2018
Field of study

Summarizing content contributed by individuals can be challenging, because people make different lexical choices even when describing the same events. However, there remains a significant need to summarize such content. Examples include the student responses to post-class reflective questions, product reviews, and news articles published by different news agencies related to the same events. High lexical diversity of these documents hinders the system's ability to effectively identify salient content and reduce summary redundancy. In this paper, we overcome this issue by introducing an integer linear programming-based summarization framework. It incorporates a low-rank approximation to the sentence-word co-occurrence matrix to intrinsically group semantically-similar lexical items. We conduct extensive experiments on datasets of student responses, product reviews, and news documents. Our approach compares favorably to a number of extractive baselines as well as a neural abstractive summarization system. The paper finally sheds light on when and why the proposed framework is effective at summarizing content with high lexical variety.Comment: Accepted for publication in the journal of Natural Language Engineering, 201

arXiv.org e-Print Archive

Crossref

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Text Simplification for Scientific Information Access:CLEF 2021 SimpleText Workshop

Author: Bellot P.
Braslavski P.
Ermakova L.
Kamps J.
Mothe J.
Nurbakova D.
Ovchinnikova I.
San-Juan E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

International Migration, Integration and Social Cohesion online publications

ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks

Author: Fabbri Alexander R.
Friedman Dan
Kasai Jungo
Li Irene
Radev Dragomir R.
Yasunaga Michihiro
Zhang Rui
Publication venue
Publication date: 17/07/2019
Field of study

Scientific article summarization is challenging: large, annotated corpora are not available, and the summary should ideally include the article's impacts on research community. This paper provides novel solutions to these two challenges. We 1) develop and release the first large-scale manually-annotated corpus for scientific papers (on computational linguistics) by enabling faster annotation, and 2) propose summarization methods that integrate the authors' original highlights (abstract) and the article's actual impacts on the community (citations), to create comprehensive, hybrid summaries. We conduct experiments to demonstrate the efficacy of our corpus in training data-driven models for scientific paper summarization and the advantage of our hybrid summaries over abstracts and traditional citation-based summaries. Our large annotated corpus and hybrid methods provide a new framework for scientific paper summarization research.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Overview of SimpleText CLEF 2021 workshop and pilot tasks

Author: Bellot P.
Braslavski P.
Ermakova L.
Kamps J.
Mothe J.
Nurbakova D.
Ovchinnikova I.
San-Juan E.
Publication venue: CEUR-WS
Publication date: 01/01/2021
Field of study

International Migration, Integration and Social Cohesion online publications

Responsible AI Considerations in Text Summarization Research: A Review of Current Practices

Author: Blodgett Su Lin
Cao Meng
Cheung Jackie Chi Kit
Liu Yu Lu
Olteanu Alexandra
Trischler Adam
Publication venue
Publication date: 18/11/2023
Field of study

AI and NLP publication venues have increasingly encouraged researchers to reflect on possible ethical considerations, adverse impacts, and other responsible AI issues their work might engender. However, for specific NLP tasks our understanding of how prevalent such issues are, or when and why these issues are likely to arise, remains limited. Focusing on text summarization -- a common NLP task largely overlooked by the responsible AI community -- we examine research and reporting practices in the current literature. We conduct a multi-round qualitative analysis of 333 summarization papers from the ACL Anthology published between 2020-2022. We focus on how, which, and when responsible AI issues are covered, which relevant stakeholders are considered, and mismatches between stated and realized research goals. We also discuss current evaluation practices and consider how authors discuss the limitations of both prior work and their own work. Overall, we find that relatively few papers engage with possible stakeholders or contexts of use, which limits their consideration of potential downstream adverse impacts or other responsible AI issues. Based on our findings, we make recommendations on concrete practices and research directions

arXiv.org e-Print Archive

Overview of SimpleText CLEF 2021 workshop and pilot tasks

Author: Bellot P.
Braslavski P.
Ermakova L.
Kamps J.
Mothe J.
Nurbakova D.
Ovchinnikova I.
San-Juan E.
Publication venue: CEUR-WS
Publication date: 01/01/2021
Field of study

International Migration, Integration and Social Cohesion online publications