4 research outputs found
Discourse Level Factors for Sentence Deletion in Text Simplification
This paper presents a data-driven study focusing on analyzing and predicting
sentence deletion -- a prevalent but understudied phenomenon in document
simplification -- on a large English text simplification corpus. We inspect
various document and discourse factors associated with sentence deletion, using
a new manually annotated sentence alignment corpus we collected. We reveal that
professional editors utilize different strategies to meet readability standards
of elementary and middle schools. To predict whether a sentence will be deleted
during simplification to a certain level, we harness automatically aligned data
to train a classification model. Evaluated on our manually annotated data, our
best models reached F1 scores of 65.2 and 59.7 for this task at the levels of
elementary and middle school, respectively. We find that discourse level
factors contribute to the challenging task of predicting sentence deletion for
simplification.Comment: Accepted in AAAI 2020. Adding more details on manual data annotatio
Controllable Text Simplification with Explicit Paraphrasing
Text Simplification improves the readability of sentences through several
rewriting transformations, such as lexical paraphrasing, deletion, and
splitting. Current simplification systems are predominantly
sequence-to-sequence models that are trained end-to-end to perform all these
operations simultaneously. However, such systems limit themselves to mostly
deleting words and cannot easily adapt to the requirements of different target
audiences. In this paper, we propose a novel hybrid approach that leverages
linguistically-motivated rules for splitting and deletion, and couples them
with a neural paraphrasing model to produce varied rewriting styles. We
introduce a new data augmentation method to improve the paraphrasing capability
of our model. Through automatic and manual evaluations, we show that our
proposed model establishes a new state-of-the-art for the task, paraphrasing
more often than the existing systems, and can control the degree of each
simplification operation applied to the input texts