201 research outputs found
Fine-tune BERT for Extractive Summarization
BERT, a pre-trained Transformer model, has achieved ground-breaking
performance on multiple NLP tasks. In this paper, we describe BERTSUM, a simple
variant of BERT, for extractive summarization. Our system is the state of the
art on the CNN/Dailymail dataset, outperforming the previous best-performed
system by 1.65 on ROUGE-L. The codes to reproduce our results are available at
https://github.com/nlpyang/BertSumComment: fix figure
A novel repetition normalized adversarial reward for headline generation
While reinforcement learning can effectively improve language generation
models, it often suffers from generating incoherent and repetitive phrases
\cite{paulus2017deep}. In this paper, we propose a novel repetition normalized
adversarial reward to mitigate these problems. Our repetition penalized reward
can greatly reduce the repetition rate and adversarial training mitigates
generating incoherent phrases. Our model significantly outperforms the baseline
model on ROUGE-1\,(+3.24), ROUGE-L\,(+2.25), and a decreased repetition-rate
(-4.98\%).Comment: Accepted by ICASSP 201
HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization
Neural extractive summarization models usually employ a hierarchical encoder
for document encoding and they are trained using sentence-level labels, which
are created heuristically using rule-based methods. Training the hierarchical
encoder with these \emph{inaccurate} labels is challenging. Inspired by the
recent work on pre-training transformer sentence encoders
\cite{devlin:2018:arxiv}, we propose {\sc Hibert} (as shorthand for {\bf
HI}erachical {\bf B}idirectional {\bf E}ncoder {\bf R}epresentations from {\bf
T}ransformers) for document encoding and a method to pre-train it using
unlabeled data. We apply the pre-trained {\sc Hibert} to our summarization
model and it outperforms its randomly initialized counterpart by 1.25 ROUGE on
the CNN/Dailymail dataset and by 2.0 ROUGE on a version of New York Times
dataset. We also achieve the state-of-the-art performance on these two
datasets.Comment: to appear in ACL 201
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
We address the problem of abstractive summarization in two directions:
proposing a novel dataset and a new model. First, we collect Reddit TIFU
dataset, consisting of 120K posts from the online discussion forum Reddit. We
use such informal crowd-generated posts as text source, in contrast with
existing datasets that mostly use formal documents as source such as news
articles. Thus, our dataset could less suffer from some biases that key
sentences usually locate at the beginning of the text and favorable summary
candidates are already inside the text in similar forms. Second, we propose a
novel abstractive summarization model named multi-level memory networks (MMN),
equipped with multi-level memory to store the information of text from
different levels of abstraction. With quantitative evaluation and user studies
via Amazon Mechanical Turk, we show the Reddit TIFU dataset is highly
abstractive and the MMN outperforms the state-of-the-art summarization models.Comment: Published in NAACL-HLT 2019 (Oral
Sample Efficient Text Summarization Using a Single Pre-Trained Transformer
Language model (LM) pre-training has resulted in impressive performance and
sample efficiency on a variety of language understanding tasks. However, it
remains unclear how to best use pre-trained LMs for generation tasks such as
abstractive summarization, particularly to enhance sample efficiency. In these
sequence-to-sequence settings, prior work has experimented with loading
pre-trained weights into the encoder and/or decoder networks, but used
non-pre-trained encoder-decoder attention weights. We instead use a pre-trained
decoder-only network, where the same Transformer LM both encodes the source and
generates the summary. This ensures that all parameters in the network,
including those governing attention over source states, have been pre-trained
before the fine-tuning step. Experiments on the CNN/Daily Mail dataset show
that our pre-trained Transformer LM substantially improves over pre-trained
Transformer encoder-decoder networks in limited-data settings. For instance, it
achieves 13.1 ROUGE-2 using only 1% of the training data (~3000 examples),
while pre-trained encoder-decoder models score 2.3 ROUGE-2
From Standard Summarization to New Tasks and Beyond: Summarization with Manifold Information
Text summarization is the research area aiming at creating a short and
condensed version of the original document, which conveys the main idea of the
document in a few words. This research topic has started to attract the
attention of a large community of researchers, and it is nowadays counted as
one of the most promising research areas. In general, text summarization
algorithms aim at using a plain text document as input and then output a
summary. However, in real-world applications, most of the data is not in a
plain text format. Instead, there is much manifold information to be
summarized, such as the summary for a web page based on a query in the search
engine, extreme long document (e.g., academic paper), dialog history and so on.
In this paper, we focus on the survey of these new summarization tasks and
approaches in the real-world application.Comment: Accepted by IJCAI 2020 Survey Trac
Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward
Sequence-to-sequence models for abstractive summarization have been studied
extensively, yet the generated summaries commonly suffer from fabricated
content, and are often found to be near-extractive. We argue that, to address
these issues, the summarizer should acquire semantic interpretation over input,
e.g., via structured representation, to allow the generation of more
informative summaries. In this paper, we present ASGARD, a novel framework for
Abstractive Summarization with Graph-Augmentation and semantic-driven RewarD.
We propose the use of dual encoders---a sequential document encoder and a
graph-structured encoder---to maintain the global context and local
characteristics of entities, complementing each other. We further design a
reward based on a multiple choice cloze test to drive the model to better
capture entity interactions. Results show that our models produce significantly
higher ROUGE scores than a variant without knowledge graph as input on both New
York Times and CNN/Daily Mail datasets. We also obtain better or comparable
performance compared to systems that are fine-tuned from large pretrained
language models. Human judges further rate our model outputs as more
informative and containing fewer unfaithful errors.Comment: Accepted as a long paper to ACL 202
Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting
Inspired by how humans summarize long documents, we propose an accurate and
fast summarization model that first selects salient sentences and then rewrites
them abstractively (i.e., compresses and paraphrases) to generate a concise
overall summary. We use a novel sentence-level policy gradient method to bridge
the non-differentiable computation between these two neural networks in a
hierarchical way, while maintaining language fluency. Empirically, we achieve
the new state-of-the-art on all metrics (including human evaluation) on the
CNN/Daily Mail dataset, as well as significantly higher abstractiveness scores.
Moreover, by first operating at the sentence-level and then the word-level, we
enable parallel decoding of our neural generative model that results in
substantially faster (10-20x) inference speed as well as 4x faster training
convergence than previous long-paragraph encoder-decoder models. We also
demonstrate the generalization of our model on the test-only DUC-2002 dataset,
where we achieve higher scores than a state-of-the-art model.Comment: ACL 2018 (17 pages
Pragmatically Informative Text Generation
We improve the informativeness of models for conditional text generation
using techniques from computational pragmatics. These techniques formulate
language production as a game between speakers and listeners, in which a
speaker should generate output text that a listener can use to correctly
identify the original input that the text describes. While such approaches are
widely used in cognitive science and grounded language learning, they have
received less attention for more standard language generation tasks. We
consider two pragmatic modeling methods for text generation: one where
pragmatics is imposed by information preservation, and another where pragmatics
is imposed by explicit modeling of distractors. We find that these methods
improve the performance of strong existing systems for abstractive
summarization and generation from structured meaning representations.Comment: 8 pages. accepted as a conference paper at NAACL2019 (short paper
Structured Summarization of Academic Publications
We propose SUSIE, a novel summarization method that can work with
state-of-the-art summarization models in order to produce structured scientific
summaries for academic articles. We also created PMC-SA, a new dataset of
academic publications, suitable for the task of structured summarization with
neural networks. We apply SUSIE combined with three different summarization
models on the new PMC-SA dataset and we show that the proposed method improves
the performance of all models by as much as 4 ROUGE points
- …