2,716 research outputs found
Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey
Generic text summarization approaches often fail to address the specific
intent and needs of individual users. Recently, scholarly attention has turned
to the development of summarization methods that are more closely tailored and
controlled to align with specific objectives and user needs. While a growing
corpus of research is devoted towards a more controllable summarization, there
is no comprehensive survey available that thoroughly explores the diverse
controllable aspects or attributes employed in this context, delves into the
associated challenges, and investigates the existing solutions. In this survey,
we formalize the Controllable Text Summarization (CTS) task, categorize
controllable aspects according to their shared characteristics and objectives,
and present a thorough examination of existing methods and datasets within each
category. Moreover, based on our findings, we uncover limitations and research
gaps, while also delving into potential solutions and future directions for
CTS.Comment: 19 pages, 1 figur
TGSum: Build Tweet Guided Multi-Document Summarization Dataset
The development of summarization research has been significantly hampered by
the costly acquisition of reference summaries. This paper proposes an effective
way to automatically collect large scales of news-related multi-document
summaries with reference to social media's reactions. We utilize two types of
social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to
cluster documents into different topic sets. Also, a tweet with a hyper-link
often highlights certain key points of the corresponding document. We
synthesize a linked document cluster to form a reference summary which can
cover most key points. To this aim, we adopt the ROUGE metrics to measure the
coverage ratio, and develop an Integer Linear Programming solution to discover
the sentence set reaching the upper bound of ROUGE. Since we allow summary
sentences to be selected from both documents and high-quality tweets, the
generated reference summaries could be abstractive. Both informativeness and
readability of the collected summaries are verified by manual judgment. In
addition, we train a Support Vector Regression summarizer on DUC generic
multi-document summarization benchmarks. With the collected data as extra
training resource, the performance of the summarizer improves a lot on all the
test sets. We release this dataset for further research.Comment: 7 pages, 1 figure in AAAI 201
Abstractive Multi-Document Summarization via Phrase Selection and Merging
We propose an abstraction-based multi-document summarization framework that
can construct new sentences by exploring more fine-grained syntactic units than
sentences, namely, noun/verb phrases. Different from existing abstraction-based
approaches, our method first constructs a pool of concepts and facts
represented by phrases from the input documents. Then new sentences are
generated by selecting and merging informative phrases to maximize the salience
of phrases and meanwhile satisfy the sentence construction constraints. We
employ integer linear optimization for conducting phrase selection and merging
simultaneously in order to achieve the global optimal solution for a summary.
Experimental results on the benchmark data set TAC 2011 show that our framework
outperforms the state-of-the-art models under automated pyramid evaluation
metric, and achieves reasonably well results on manual linguistic quality
evaluation.Comment: 11 pages, 1 figure, accepted as a full paper at ACL 201
Coherent Multi-Sentence Video Description with Variable Level of Detail
Humans can easily describe what they see in a coherent way and at varying
level of detail. However, existing approaches for automatic video description
are mainly focused on single sentence generation and produce descriptions at a
fixed level of detail. In this paper, we address both of these limitations: for
a variable level of detail we produce coherent multi-sentence descriptions of
complex videos. We follow a two-step approach where we first learn to predict a
semantic representation (SR) from video and then generate natural language
descriptions from the SR. To produce consistent multi-sentence descriptions, we
model across-sentence consistency at the level of the SR by enforcing a
consistent topic. We also contribute both to the visual recognition of objects
proposing a hand-centric approach as well as to the robust generation of
sentences using a word lattice. Human judges rate our multi-sentence
descriptions as more readable, correct, and relevant than related work. To
understand the difference between more detailed and shorter descriptions, we
collect and analyze a video description corpus of three levels of detail.Comment: 10 page
Self-Supervised and Controlled Multi-Document Opinion Summarization
We address the problem of unsupervised abstractive summarization of
collections of user generated reviews with self-supervision and control. We
propose a self-supervised setup that considers an individual document as a
target summary for a set of similar documents. This setting makes training
simpler than previous approaches by relying only on standard log-likelihood
loss. We address the problem of hallucinations through the use of control
codes, to steer the generation towards more coherent and relevant
summaries.Finally, we extend the Transformer architecture to allow for multiple
reviews as input. Our benchmarks on two datasets against graph-based and recent
neural abstractive unsupervised models show that our proposed method generates
summaries with a superior quality and relevance.This is confirmed in our human
evaluation which focuses explicitly on the faithfulness of generated summaries
We also provide an ablation study, which shows the importance of the control
setup in controlling hallucinations and achieve high sentiment and topic
alignment of the summaries with the input reviews.Comment: 18 pages including 5 pages appendi
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects
Natural Language Generation (NLG) typically involves evaluating the generated
text in various aspects (e.g., consistency and naturalness) to obtain a
comprehensive assessment. However, multi-aspect evaluation remains challenging
as it may require the evaluator to generalize to any given evaluation aspect
even if it's absent during training. In this paper, we introduce X-Eval, a
two-stage instruction tuning framework to evaluate the text in both seen and
unseen aspects customized by end users. X-Eval consists of two learning stages:
the vanilla instruction tuning stage that improves the model's ability to
follow evaluation instructions, and an enhanced instruction tuning stage that
exploits the connections between fine-grained evaluation aspects to better
assess text quality. To support the training of X-Eval, we collect
AspectInstruct, the first instruction tuning dataset tailored for multi-aspect
NLG evaluation spanning 27 diverse evaluation aspects with 65 tasks. To enhance
task diversity, we devise an augmentation strategy that converts human rating
annotations into diverse forms of NLG evaluation tasks, including scoring,
comparison, ranking, and Boolean question answering. Extensive experiments
across three essential categories of NLG tasks: dialogue generation,
summarization, and data-to-text coupled with 21 aspects in meta-evaluation,
demonstrate that our X-Eval enables even a lightweight language model to
achieve a comparable if not higher correlation with human judgments compared to
the state-of-the-art NLG evaluators, such as GPT-4.Comment: 17 pages, 5 figures, 14 table
A survey on opinion summarization technique s for social media
The volume of data on the social media is huge and even keeps increasing. The need for efficient processing of this extensive information resulted in increasing research interest in knowledge engineering tasks such as Opinion Summarization. This survey shows the current opinion summarization challenges for social media, then the necessary pre-summarization steps like preprocessing, features extraction, noise elimination, and handling of synonym features. Next, it covers the various approaches used in opinion summarization like Visualization, Abstractive, Aspect based, Query-focused, Real Time, Update Summarization, and highlight other Opinion Summarization approaches such as Contrastive, Concept-based, Community Detection, Domain Specific, Bilingual, Social Bookmarking, and Social Media Sampling. It covers the different datasets used in opinion summarization and future work suggested in each technique. Finally, it provides different ways for evaluating opinion summarization
- …