Search CORE

2 research outputs found

Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification

Author: Iyyer Mohit
Vu Tu
Publication venue
Publication date: 09/06/2019
Field of study

While paragraph embedding models are remarkably effective for downstream classification tasks, what they learn and encode into a single vector remains opaque. In this paper, we investigate a state-of-the-art paragraph embedding method proposed by Zhang et al. (2017) and discover that it cannot reliably tell whether a given sentence occurs in the input paragraph or not. We formulate a sentence content task to probe for this basic linguistic property and find that even a much simpler bag-of-words method has no trouble solving it. This result motivates us to replace the reconstruction-based objective of Zhang et al. (2017) with our sentence content probe objective in a semi-supervised setting. Despite its simplicity, our objective improves over paragraph reconstruction in terms of (1) downstream classification accuracies on benchmark datasets, (2) faster training, and (3) better generalization ability.Comment: Accepted as a conference paper at ACL 201

arXiv.org e-Print Archive

A Generative Approach to Titling and Clustering Wikipedia Sections

Author: Baumgartner Simon
Field Anjalie
Ittycheriah Abe
Rothe Sascha
Yu Cong
Publication venue
Publication date: 22/05/2020
Field of study

We evaluate the performance of transformer encoders with various decoders for information organization through a new task: generation of section headings for Wikipedia articles. Our analysis shows that decoders containing attention mechanisms over the encoder output achieve high-scoring results by generating extractive text. In contrast, a decoder without attention better facilitates semantic encoding and can be used to generate section embeddings. We additionally introduce a new loss function, which further encourages the decoder to generate high-quality embeddings.Comment: Accepted to WNGT Workshop at ACL 202

arXiv.org e-Print Archive