8 research outputs found
Low- and high-resource opinion summarization
Customer reviews play a vital role in the online purchasing decisions we make. The reviews
express user opinions that are useful for setting realistic expectations and uncovering important
details about products. However, some products receive hundreds or even thousands of
reviews, making them time-consuming to read. Moreover, many reviews contain uninformative
content, such as irrelevant personal experiences. Automatic summarization offers an
alternative – short text summaries capturing the essential information expressed in reviews.
Automatically produced summaries can reflect overall or particular opinions and be tailored to
user preferences. Besides being presented on major e-commerce platforms, home assistants
can also vocalize them. This approach can improve user satisfaction by assisting in making
faster and better decisions.
Modern summarization approaches are based on neural networks, often requiring thousands of
annotated samples for training. However, human-written summaries for products are expensive
to produce because annotators need to read many reviews. This has led to annotated data
scarcity where only a few datasets are available. Data scarcity is the central theme of our
works, and we propose a number of approaches to alleviate the problem. The thesis consists
of two parts where we discuss low- and high-resource data settings.
In the first part, we propose self-supervised learning methods applied to customer reviews
and few-shot methods for learning from small annotated datasets. Customer reviews without
summaries are available in large quantities, contain a breadth of in-domain specifics, and
provide a powerful training signal. We show that reviews can be used for learning summarizers
via a self-supervised objective. Further, we address two main challenges associated with
learning from small annotated datasets. First, large models rapidly overfit on small datasets
leading to poor generalization. Second, it is not possible to learn a wide range of in-domain
specifics (e.g., product aspects and usage) from a handful of gold samples. This leads to
subtle semantic mistakes in generated summaries, such as ‘great dead on arrival battery.’ We
address the first challenge by explicitly modeling summary properties (e.g., content coverage
and sentiment alignment). Furthermore, we leverage small modules – adapters – that are
more robust to overfitting. As we show, despite their size, these modules can be used to
store in-domain knowledge to reduce semantic mistakes. Lastly, we propose a simple method
for learning personalized summarizers based on aspects, such as ‘price,’ ‘battery life,’ and
‘resolution.’ This task is harder to learn, and we present a few-shot method for training a
query-based summarizer on small annotated datasets.
In the second part, we focus on the high-resource setting and present a large dataset with
summaries collected from various online resources. The dataset has more than 33,000 humanwritten
summaries, where each is linked up to thousands of reviews. This, however, makes it
challenging to apply an ‘expensive’ deep encoder due to memory and computational costs. To
address this problem, we propose selecting small subsets of informative reviews. Only these
subsets are encoded by the deep encoder and subsequently summarized. We show that the
selector and summarizer can be trained end-to-end via amortized inference and policy gradient
methods
Few-Shot Learning for Opinion Summarization
Opinion summarization is the automatic creation of text reflecting subjective
information expressed in multiple documents, such as user reviews of a product.
The task is practically important and has attracted a lot of attention.
However, due to the high cost of summary production, datasets large enough for
training supervised models are lacking. Instead, the task has been
traditionally approached with extractive methods that learn to select text
fragments in an unsupervised or weakly-supervised way. Recently, it has been
shown that abstractive summaries, potentially more fluent and better at
reflecting conflicting information, can also be produced in an unsupervised
fashion. However, these models, not being exposed to actual summaries, fail to
capture their essential properties. In this work, we show that even a handful
of summaries is sufficient to bootstrap generation of the summary text with all
expected properties, such as writing style, informativeness, fluency, and
sentiment preservation. We start by training a conditional Transformer language
model to generate a new product review given other available reviews of the
product. The model is also conditioned on review properties that are directly
related to summaries; the properties are derived from reviews with no manual
effort. In the second stage, we fine-tune a plug-in module that learns to
predict property values on a handful of summaries. This lets us switch the
generator to the summarization mode. We show on Amazon and Yelp datasets that
our approach substantially outperforms previous extractive and abstractive
methods in automatic and human evaluation.Comment: EMNLP 202
Small Language Models Improve Giants by Rewriting Their Outputs
Large language models (LLMs) have demonstrated impressive few-shot learning
capabilities, but they often underperform compared to fine-tuned models on
challenging tasks. Furthermore, their large size and restricted access only
through APIs make task-specific fine-tuning impractical. Moreover, LLMs are
sensitive to different aspects of prompts (e.g., the selection and order of
demonstrations) and can thus require time-consuming prompt engineering. In this
light, we propose a method to correct LLM outputs without relying on their
weights. First, we generate a pool of candidates by few-shot prompting an LLM.
Second, we refine the LLM-generated outputs using a smaller model, the
LM-corrector (LMCor), which is trained to rank, combine and rewrite the
candidates to produce the final target output. Our experiments demonstrate that
even a small LMCor model (250M) substantially improves the few-shot performance
of LLMs (62B) across diverse tasks. Moreover, we illustrate that the LMCor
exhibits robustness against different prompts, thereby minimizing the need for
extensive prompt engineering. Finally, we showcase that the LMCor can be
seamlessly integrated with different LLMs at inference time, serving as a
plug-and-play module to improve their performance
Efficient Few-Shot Fine-Tuning for Opinion Summarization
Abstractive summarization models are typically pre-trained on large amounts
of generic texts, then fine-tuned on tens or hundreds of thousands of annotated
samples. However, in opinion summarization, large annotated datasets of reviews
paired with reference summaries are not available and would be expensive to
create. This calls for fine-tuning methods robust to overfitting on small
datasets. In addition, generically pre-trained models are often not accustomed
to the specifics of customer reviews and, after fine-tuning, yield summaries
with disfluencies and semantic mistakes. To address these problems, we utilize
an efficient few-shot method based on adapters which, as we show, can easily
store in-domain knowledge. Instead of fine-tuning the entire model, we add
adapters and pre-train them in a task-specific way on a large corpus of
unannotated customer reviews, using held-out reviews as pseudo summaries. Then,
fine-tune the adapters on the small available human-annotated dataset. We show
that this self-supervised adapter pre-training improves summary quality over
standard fine-tuning by 2.0 and 1.3 ROUGE-L points on the Amazon and Yelp
datasets, respectively. Finally, for summary personalization, we condition on
aspect keyword queries, automatically created from generic datasets. In the
same vein, we pre-train the adapters in a query-based manner on customer
reviews and then fine-tune them on annotated datasets. This results in
better-organized summary content reflected in improved coherence and fewer
redundancies.Comment: NAACL Findings 202
Beyond Opinion Mining: Summarizing Opinions of Customer Reviews
Customer reviews are vital for making purchasing decisions in the Information
Age. Such reviews can be automatically summarized to provide the user with an
overview of opinions. In this tutorial, we present various aspects of opinion
summarization that are useful for researchers and practitioners. First, we will
introduce the task and major challenges. Then, we will present existing opinion
summarization solutions, both pre-neural and neural. We will discuss how
summarizers can be trained in the unsupervised, few-shot, and supervised
regimes. Each regime has roots in different machine learning methods, such as
auto-encoding, controllable text generation, and variational inference.
Finally, we will discuss resources and evaluation methods and conclude with the
future directions. This three-hour tutorial will provide a comprehensive
overview over major advances in opinion summarization. The listeners will be
well-equipped with the knowledge that is both useful for research and practical
applications.Comment: SIGIR Tutorial 202