Search CORE

2,240 research outputs found

Unsupervised Opinion Summarization with Noising and Denoising

Author: Amplayo Reinald Kim
Lapata Mirella
Publication venue
Publication date: 01/01/2020
Field of study

The supervised training of high-capacity models on large datasets containing hundreds of thousands of document-summary pairs is critical to the recent success of deep learning techniques for abstractive summarization. Unfortunately, in most domains (other than news) such training data is not available and cannot be easily sourced. In this paper we enable the use of supervised learning for the setting where there are only documents available (e.g.,~product or business reviews) without ground truth summaries. We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof which we treat as pseudo-review input. We introduce several linguistically motivated noise generation functions and a summarization model which learns to denoise the input and generate the original review. At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise. Extensive automatic and human evaluation shows that our model brings substantial improvements over both abstractive and extractive baselines.Comment: ACL 202

arXiv.org e-Print Archive

Crossref

Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey

Author: Mishra Pruthwik
Mishra Rahul
Roy Tathagato
Urlana Ashok
Publication venue
Publication date: 15/11/2023
Field of study

Generic text summarization approaches often fail to address the specific intent and needs of individual users. Recently, scholarly attention has turned to the development of summarization methods that are more closely tailored and controlled to align with specific objectives and user needs. While a growing corpus of research is devoted towards a more controllable summarization, there is no comprehensive survey available that thoroughly explores the diverse controllable aspects or attributes employed in this context, delves into the associated challenges, and investigates the existing solutions. In this survey, we formalize the Controllable Text Summarization (CTS) task, categorize controllable aspects according to their shared characteristics and objectives, and present a thorough examination of existing methods and datasets within each category. Moreover, based on our findings, we uncover limitations and research gaps, while also delving into potential solutions and future directions for CTS.Comment: 19 pages, 1 figur

arXiv.org e-Print Archive

Sentence Embedding Approach using LSTM Auto-encoder for Discussion Threads Summarization

Author: Al-Obeidat Feras
Amin Adnan
Khalid Afsheen
Khan Abdul Wali
Moreira Fernando
Publication venue: ZU Scholars
Publication date: 01/09/2023
Field of study

Online discussion forums are repositories of valuable information where users interact and articulate their ideas and opinions, and share experiences about numerous topics. These online discussion forums are internet-based online communities where users can ask for help and find the solution to a problem. A new user of online discussion forums becomes exhausted from reading the significant number of irrelevant replies in a discussion. An automated discussion thread summarizing system (DTS) is necessary to create a candid view of the entire discussion of a query. Most of the previous approaches for automated DTS use the continuous bag of words (CBOW) model as a sentence embedding tool, which is poor at capturing the overall meaning of the sentence and is unable to grasp word dependency. To overcome these limitations, we introduce the LSTM Auto-encoder as a sentence embedding technique to improve the performance of DTS. The empirical result in the context of the proposed approach’s average precision, recall, and F-measure with respect to ROGUE-1 and ROUGE-2 of two standard experimental datasets demonstrates the effectiveness and efficiency of the proposed approach and outperforms the state-of-the-art CBOW model in sentence embedding tasks and boost the performance of the automated DTS model

ZU Scholars (Zayed University)

Extractive Automatic Text Summarization Based on Lexical-Semantic Keywords

Author: GARCIA HERNANDEZ RENE ARNULFO
GARCIA HERNANDEZ RENE ARNULFO
HERNANDEZ CASTAÑEDA ANGEL
HERNANDEZ CASTAÑEDA ANGEL
Ledeneva Yulia
Ledeneva Yulia
MILLAN HERNANDEZ CHRISTIAN EDUARDO
MILLAN HERNANDEZ CHRISTIAN EDUARDO
Publication venue: IEEE Access
Publication date: 11/03/2020
Field of study

The automatic text summarization (ATS) task consists in automatically synthesizing a document to provide a condensed version of it. Creating a summary requires not only selecting the main topics of the sentences but also identifying the key relationships between these topics. Related works rank text units (mainly sentences) to select those that could form the summary. However, the resulting summaries may not include all the topics covered in the source text because important information may have been discarded. In addition, the semantic structure of documents has been barely explored in this field. Thus, this study proposes a new method for the ATS task that takes advantage of semantic information to improve keyword detection. This proposed method increases not only the coverage by clustering the sentences to identify the main topics in the source document but also the precision by detecting the keywords in the clusters. The experimental results of this work indicate that the proposed method outperformed previous methods with a standard collection

Repositorio Institucional de la Universidad Autónoma del Estado de México

Recommended from our members

Adapting Automatic Summarization to New Sources of Information

Author: Ouyang Jessica Jin
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

English-language news articles are no longer necessarily the best source of information. The Web allows information to spread more quickly and travel farther: first-person accounts of breaking news events pop up on social media, and foreign-language news articles are accessible to, if not immediately understandable by, English-speaking users. This thesis focuses on developing automatic summarization techniques for these new sources of information. We focus on summarizing two specific new sources of information: personal narratives, first-person accounts of exciting or unusual events that are readily found in blog entries and other social media posts, and non-English documents, which must first be translated into English, often introducing translation errors that complicate the summarization process. Personal narratives are a very new area of interest in natural language processing research, and they present two key challenges for summarization. First, unlike many news articles, whose lead sentences serve as summaries of the most important ideas in the articles, personal narratives provide no such shortcuts for determining where important information occurs in within them; second, personal narratives are written informally and colloquially, and unlike news articles, they are rarely edited, so they require heavier editing and rewriting during the summarization process. Non-English documents, whether news or narrative, present yet another source of difficulty on top of any challenges inherent to their genre: they must be translated into English, potentially introducing translation errors and disfluencies that must be identified and corrected during summarization. The bulk of this thesis is dedicated to addressing the challenges of summarizing personal narratives found on the Web. We develop a two-stage summarization system for personal narrative that first extracts sentences containing important content and then rewrites those sentences into summary-appropriate forms. Our content extraction system is inspired by contextualist narrative theory, using changes in writing style throughout a narrative to detect sentences containing important information; it outperforms both graph-based and neural network approaches to sentence extraction for this genre. Our paraphrasing system rewrites the extracted sentences into shorter, standalone summary sentences, learning to mimic the paraphrasing choices of human summarizers more closely than can traditional lexicon- or translation-based paraphrasing approaches. We conclude with a chapter dedicated to summarizing non-English documents written in low-resource languages – documents that would otherwise be unreadable for English-speaking users. We develop a cross-lingual summarization system that performs even heavier editing and rewriting than does our personal narrative paraphrasing system; we create and train on large amounts of synthetic errorful translations of foreign-language documents. Our approach produces fluent English summaries from disdisfluent translations of non-English documents, and it generalizes across languages

Columbia University Academic Commons