15,192 research outputs found
Split and Rephrase
We propose a new sentence simplification task (Split-and-Rephrase) where the
aim is to split a complex sentence into a meaning preserving sequence of
shorter sentences. Like sentence simplification, splitting-and-rephrasing has
the potential of benefiting both natural language processing and societal
applications. Because shorter sentences are generally better processed by NLP
systems, it could be used as a preprocessing step which facilitates and
improves the performance of parsers, semantic role labellers and machine
translation systems. It should also be of use for people with reading
disabilities because it allows the conversion of longer sentences into shorter
ones. This paper makes two contributions towards this new task. First, we
create and make available a benchmark consisting of 1,066,115 tuples mapping a
single complex sentence to a sequence of sentences expressing the same meaning.
Second, we propose five models (vanilla sequence-to-sequence to
semantically-motivated models) to understand the difficulty of the proposed
task.Comment: 11 pages, EMNLP 201
Small but Mighty: New Benchmarks for Split and Rephrase
Split and Rephrase is a text simplification task of rewriting a complex
sentence into simpler ones. As a relatively new task, it is paramount to ensure
the soundness of its evaluation benchmark and metric. We find that the widely
used benchmark dataset universally contains easily exploitable syntactic cues
caused by its automatic generation process. Taking advantage of such cues, we
show that even a simple rule-based model can perform on par with the
state-of-the-art model. To remedy such limitations, we collect and release two
crowdsourced benchmark datasets. We not only make sure that they contain
significantly more diverse syntax, but also carefully control for their quality
according to a well-defined set of criteria. While no satisfactory automatic
metric exists, we apply fine-grained manual evaluation based on these criteria
using crowdsourcing, showing that our datasets better represent the task and
are significantly more challenging for the models.Comment: In EMNLP 202
Fact-aware Sentence Split and Rephrase with Permutation Invariant Training
Sentence Split and Rephrase aims to break down a complex sentence into
several simple sentences with its meaning preserved. Previous studies tend to
address the issue by seq2seq learning from parallel sentence pairs, which takes
a complex sentence as input and sequentially generates a series of simple
sentences. However, the conventional seq2seq learning has two limitations for
this task: (1) it does not take into account the facts stated in the long
sentence; As a result, the generated simple sentences may miss or inaccurately
state the facts in the original sentence. (2) The order variance of the simple
sentences to be generated may confuse the seq2seq model during training because
the simple sentences derived from the long source sentence could be in any
order.
To overcome the challenges, we first propose the Fact-aware Sentence
Encoding, which enables the model to learn facts from the long sentence and
thus improves the precision of sentence split; then we introduce Permutation
Invariant Training to alleviate the effects of order variance in seq2seq
learning for this task. Experiments on the WebSplit-v1.0 benchmark dataset show
that our approaches can largely improve the performance over the previous
seq2seq learning approaches. Moreover, an extrinsic evaluation on oie-benchmark
verifies the effectiveness of our approaches by an observation that splitting
long sentences with our state-of-the-art model as preprocessing is helpful for
improving OpenIE performance.Comment: AAAI 202
BLEU is Not Suitable for the Evaluation of Text Simplification
BLEU is widely considered to be an informative metric for text-to-text
generation, including Text Simplification (TS). TS includes both lexical and
structural aspects. In this paper we show that BLEU is not suitable for the
evaluation of sentence splitting, the major structural simplification
operation. We manually compiled a sentence splitting gold standard corpus
containing multiple structural paraphrases, and performed a correlation
analysis with human judgments. We find low or no correlation between BLEU and
the grammaticality and meaning preservation parameters where sentence splitting
is involved. Moreover, BLEU often negatively correlates with simplicity,
essentially penalizing simpler sentences.Comment: Accepted to EMNLP 2018 (Short papers
Finiteness and orbifold Vertex Operator Algebras
In this paper, I investigate the ascending chain condition of right ideals in
the case of vertex operator algebras satisfying a finiteness and/or a
simplicity condition. Possible applications to the study of finiteness of
orbifold VOAs is discussed.Comment: 12 pages, comments are welcom
- …