417 research outputs found
Read what you need: Controllable Aspect-based Opinion Summarization of Tourist Reviews
Manually extracting relevant aspects and opinions from large volumes of
user-generated text is a time-consuming process. Summaries, on the other hand,
help readers with limited time budgets to quickly consume the key ideas from
the data. State-of-the-art approaches for multi-document summarization,
however, do not consider user preferences while generating summaries. In this
work, we argue the need and propose a solution for generating personalized
aspect-based opinion summaries from large collections of online tourist
reviews. We let our readers decide and control several attributes of the
summary such as the length and specific aspects of interest among others.
Specifically, we take an unsupervised approach to extract coherent aspects from
tourist reviews posted on TripAdvisor. We then propose an Integer Linear
Programming (ILP) based extractive technique to select an informative subset of
opinions around the identified aspects while respecting the user-specified
values for various control parameters. Finally, we evaluate and compare our
summaries using crowdsourcing and ROUGE-based metrics and obtain competitive
results.Comment: 4 pages, accepted in the Proceedings of the 43rd International ACM
SIGIR Conference on Research and Development in Information Retrieval
(SIGIR), 202
Energy-based Self-attentive Learning of Abstractive Communities for Spoken Language Understanding
Abstractive community detection is an important spoken language understanding
task, whose goal is to group utterances in a conversation according to whether
they can be jointly summarized by a common abstractive sentence. This paper
provides a novel approach to this task. We first introduce a neural contextual
utterance encoder featuring three types of self-attention mechanisms. We then
train it using the siamese and triplet energy-based meta-architectures.
Experiments on the AMI corpus show that our system outperforms multiple
energy-based and non-energy based baselines from the state-of-the-art. Code and
data are publicly available.Comment: Update baseline
CUED at ProbSum 2023: Hierarchical Ensemble of Summarization Models
In this paper, we consider the challenge of summarizing patients' medical
progress notes in a limited data setting. For the Problem List Summarization
(shared task 1A) at the BioNLP Workshop 2023, we demonstrate that Clinical-T5
fine-tuned to 765 medical clinic notes outperforms other extractive,
abstractive and zero-shot baselines, yielding reasonable baseline systems for
medical note summarization. Further, we introduce Hierarchical Ensemble of
Summarization Models (HESM), consisting of token-level ensembles of diverse
fine-tuned Clinical-T5 models, followed by Minimum Bayes Risk (MBR) decoding.
Our HESM approach lead to a considerable summarization performance boost, and
when evaluated on held-out challenge data achieved a ROUGE-L of 32.77, which
was the best-performing system at the top of the shared task leaderboard.Comment: BioNLP Workshop @ ACL 202
The Extractive-Abstractive Axis: Measuring Content "Borrowing" in Generative Language Models
Generative language models produce highly abstractive outputs by design, in
contrast to extractive responses in search engines. Given this characteristic
of LLMs and the resulting implications for content Licensing & Attribution, we
propose the the so-called Extractive-Abstractive axis for benchmarking
generative models and highlight the need for developing corresponding metrics,
datasets and annotation guidelines. We limit our discussion to the text
modality
A Survey of Deep Learning Approaches for Natural Language Processing Tasks
In recent years, deep learning has been a go-to method for solving difficult NLP problems. Deep learning models have attained state-of-the-art performance across a wide range of natural language processing applications, including text summarization, sentiment analysis, named entity identification, and language translation, by utilizing enormous neural network designs and massive volumes of training data. In this paper, we take a look at the most important deep learning methods and how they've been used for different natural language processing jobs. We go over the basics of neural network designs including CNNs, RNNs, and transformers, and we also go over some of the more recent developments, such as BERT and GPT-3. Our discussion of each method centers on its guiding principles, benefits, drawbacks, and significant NLP applications. To further illustrate the relative merits of various models, we also provide their comparative performance findings on industry-standard benchmark datasets. We also highlight some of the present difficulties and potential future avenues of study in deep learning applied to natural language processing. The purpose of this survey is to offer academics and practitioners in natural language processing a high-level perspective on how to make good use of deep learning in their respective fields
Is Summary Useful or Not? An Extrinsic Human Evaluation of Text Summaries on Downstream Tasks
Research on automated text summarization relies heavily on human and
automatic evaluation. While recent work on human evaluation mainly adopted
intrinsic evaluation methods, judging the generic quality of text summaries,
e.g. informativeness and coherence, our work focuses on evaluating the
usefulness of text summaries with extrinsic methods. We carefully design three
different downstream tasks for extrinsic human evaluation of summaries, i.e.,
question answering, text classification and text similarity assessment. We
carry out experiments using system rankings and user behavior data to evaluate
the performance of different summarization models. We find summaries are
particularly useful in tasks that rely on an overall judgment of the text,
while being less effective for question answering tasks. The results show that
summaries generated by fine-tuned models lead to higher consistency in
usefulness across all three tasks, as rankings of fine-tuned summarization
systems are close across downstream tasks according to the proposed extrinsic
metrics. Summaries generated by models in the zero-shot setting, however, are
found to be biased towards the text classification and similarity assessment
tasks, due to its general and less detailed summary style. We further evaluate
the correlation of 14 intrinsic automatic metrics with human criteria and show
that intrinsic automatic metrics perform well in evaluating the usefulness of
summaries in the question-answering task, but are less effective in the other
two tasks. This highlights the limitations of relying solely on intrinsic
automatic metrics in evaluating the performance and usefulness of summaries
Questioning Biases in Case Judgment Summaries: Legal Datasets or Large Language Models?
The evolution of legal datasets and the advent of large language models
(LLMs) have significantly transformed the legal field, particularly in the
generation of case judgment summaries. However, a critical concern arises
regarding the potential biases embedded within these summaries. This study
scrutinizes the biases present in case judgment summaries produced by legal
datasets and large language models. The research aims to analyze the impact of
biases on legal decision making. By interrogating the accuracy, fairness, and
implications of biases in these summaries, this study contributes to a better
understanding of the role of technology in legal contexts and the implications
for justice systems worldwide. In this study, we investigate biases wrt
Gender-related keywords, Race-related keywords, Keywords related to crime
against women, Country names and religious keywords. The study shows
interesting evidences of biases in the outputs generated by the large language
models and pre-trained abstractive summarization models. The reasoning behind
these biases needs further studies
Fair Abstractive Summarization of Diverse Perspectives
People from different social and demographic groups express diverse
perspectives and conflicting opinions on a broad set of topics such as product
reviews, healthcare, law, and politics. A fair summary should provide a
comprehensive coverage of diverse perspectives without underrepresenting
certain groups. However, current work in summarization metrics and Large
Language Models (LLMs) evaluation has not explored fair abstractive
summarization. In this paper, we systematically investigate fair abstractive
summarization for user-generated data. We first formally define fairness in
abstractive summarization as not underrepresenting perspectives of any groups
of people and propose four reference-free automatic metrics measuring the
differences between target and source perspectives. We evaluate five LLMs,
including three GPT models, Alpaca, and Claude, on six datasets collected from
social media, online reviews, and recorded transcripts. Experiments show that
both the model-generated and the human-written reference summaries suffer from
low fairness. We conduct a comprehensive analysis of the common factors
influencing fairness and propose three simple but effective methods to
alleviate unfair summarization. Our dataset and code are available at
https://github.com/psunlpgroup/FairSumm.Comment: 19 pages, 10 figure
Towards Argument-Aware Abstractive Summarization of Long Legal Opinions with Summary Reranking
We propose a simple approach for the abstractive summarization of long legal
opinions that considers the argument structure of the document. Legal opinions
often contain complex and nuanced argumentation, making it challenging to
generate a concise summary that accurately captures the main points of the
legal opinion. Our approach involves using argument role information to
generate multiple candidate summaries, then reranking these candidates based on
alignment with the document's argument structure. We demonstrate the
effectiveness of our approach on a dataset of long legal opinions and show that
it outperforms several strong baselines
- …