Search CORE

15,192 research outputs found

Split and Rephrase

Author: Cohen Shay
Gardent Claire
Narayan Shashi
Shimorina Anastasia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

We propose a new sentence simplification task (Split-and-Rephrase) where the aim is to split a complex sentence into a meaning preserving sequence of shorter sentences. Like sentence simplification, splitting-and-rephrasing has the potential of benefiting both natural language processing and societal applications. Because shorter sentences are generally better processed by NLP systems, it could be used as a preprocessing step which facilitates and improves the performance of parsers, semantic role labellers and machine translation systems. It should also be of use for people with reading disabilities because it allows the conversion of longer sentences into shorter ones. This paper makes two contributions towards this new task. First, we create and make available a benchmark consisting of 1,066,115 tuples mapping a single complex sentence to a sequence of sentences expressing the same meaning. Second, we propose five models (vanilla sequence-to-sequence to semantically-motivated models) to understand the difficulty of the proposed task.Comment: 11 pages, EMNLP 201

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Small but Mighty: New Benchmarks for Split and Rephrase

Author: Brahma Siddhartha
Li Yunyao
Zhang Li
Zhu Huaiyu
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Split and Rephrase is a text simplification task of rewriting a complex sentence into simpler ones. As a relatively new task, it is paramount to ensure the soundness of its evaluation benchmark and metric. We find that the widely used benchmark dataset universally contains easily exploitable syntactic cues caused by its automatic generation process. Taking advantage of such cues, we show that even a simple rule-based model can perform on par with the state-of-the-art model. To remedy such limitations, we collect and release two crowdsourced benchmark datasets. We not only make sure that they contain significantly more diverse syntax, but also carefully control for their quality according to a well-defined set of criteria. While no satisfactory automatic metric exists, we apply fine-grained manual evaluation based on these criteria using crowdsourcing, showing that our datasets better represent the task and are significantly more challenging for the models.Comment: In EMNLP 202

arXiv.org e-Print Archive

Crossref

Fact-aware Sentence Split and Rephrase with Permutation Invariant Training

Author: Ge Tao
Guo Yinuo
Wei Furu
Publication venue
Publication date: 02/02/2020
Field of study

Sentence Split and Rephrase aims to break down a complex sentence into several simple sentences with its meaning preserved. Previous studies tend to address the issue by seq2seq learning from parallel sentence pairs, which takes a complex sentence as input and sequentially generates a series of simple sentences. However, the conventional seq2seq learning has two limitations for this task: (1) it does not take into account the facts stated in the long sentence; As a result, the generated simple sentences may miss or inaccurately state the facts in the original sentence. (2) The order variance of the simple sentences to be generated may confuse the seq2seq model during training because the simple sentences derived from the long source sentence could be in any order. To overcome the challenges, we first propose the Fact-aware Sentence Encoding, which enables the model to learn facts from the long sentence and thus improves the precision of sentence split; then we introduce Permutation Invariant Training to alleviate the effects of order variance in seq2seq learning for this task. Experiments on the WebSplit-v1.0 benchmark dataset show that our approaches can largely improve the performance over the previous seq2seq learning approaches. Moreover, an extrinsic evaluation on oie-benchmark verifies the effectiveness of our approaches by an observation that splitting long sentences with our state-of-the-art model as preprocessing is helpful for improving OpenIE performance.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

BLEU is Not Suitable for the Evaluation of Text Simplification

Author: Abend Omri
Rappoport Ari
Sulem Elior
Publication venue
Publication date: 01/01/2018
Field of study

BLEU is widely considered to be an informative metric for text-to-text generation, including Text Simplification (TS). TS includes both lexical and structural aspects. In this paper we show that BLEU is not suitable for the evaluation of sentence splitting, the major structural simplification operation. We manually compiled a sentence splitting gold standard corpus containing multiple structural paraphrases, and performed a correlation analysis with human judgments. We find low or no correlation between BLEU and the grammaticality and meaning preservation parameters where sentence splitting is involved. Moreover, BLEU often negatively correlates with simplicity, essentially penalizing simpler sentences.Comment: Accepted to EMNLP 2018 (Short papers

arXiv.org e-Print Archive

Crossref

Finiteness and orbifold Vertex Operator Algebras

Author: A D'Andrea
A D'Andrea
A D'Andrea
A Sole De
AR Linshaw
B Bakalov
B Bakalov
C Dong
H-S Li
JF Ritt
VG Kac
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/10/2013
Field of study

In this paper, I investigate the ascending chain condition of right ideals in the case of vertex operator algebras satisfying a finiteness and/or a simplicity condition. Possible applications to the study of finiteness of orbifold VOAs is discussed.Comment: 12 pages, comments are welcom

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio della ricerca- Università di Roma La Sapienza