46,261 research outputs found
Un sistema de simplificación de textos on-line para el inglés
Text Simplification is the task of reducing the lexical and syntactic complexity of documents in order to improve their readability and understandability. This paper presents a web-based demonstration of a text simplification system that performs state-of-the-art lexical and syntactic simplification of English texts. The core simplification technology used for this demonstration is highly customizable making it suitable for different types of users.La simplificación textual consiste en reducir la complejidad léxica y sintáctica de documentos con el fin de mejorar su legibilidad y comprensibilidad. En este trabajo se presenta una demostración de un sistema on-line de simplificación léxica y sintáctica de textos en inglés. Nuestro sistema es modular y adaptable, lo que lo hace adecuado para diversos tipos de usuarios.This work was funded by the ABLE-TO-INCLUDE project (European Commission Competitiveness and Innovation Framework Programme under Grant Agreement No. 621055) and project SKATER-UPF-TALN (TIN2012- 38584-C06-03) from Ministerio de Economía y Competitividad, Secretaría de Estado de Investigación, Desarrollo e Innovación, Spain
Multilingual Unsupervised Sentence Simplification
Progress in Sentence Simplification has been hindered by the lack of
supervised data, particularly in languages other than English. Previous work
has aligned sentences from original and simplified corpora such as English
Wikipedia and Simple English Wikipedia, but this limits corpus size, domain,
and language. In this work, we propose using unsupervised mining techniques to
automatically create training corpora for simplification in multiple languages
from raw Common Crawl web data. When coupled with a controllable generation
mechanism that can flexibly adjust attributes such as length and lexical
complexity, these mined paraphrase corpora can be used to train simplification
systems in any language. We further incorporate multilingual unsupervised
pretraining methods to create even stronger models and show that by training on
mined data rather than supervised corpora, we outperform the previous best
results. We evaluate our approach on English, French, and Spanish
simplification benchmarks and reach state-of-the-art performance with a totally
unsupervised approach. We will release our models and code to mine the data in
any language included in Common Crawl
Automated text simplification as a preprocessing step for machine translation into an under-resourced language
In this work, we investigate the possibility of using fully automatic text simplification system on the English source in machine translation (MT) for improving its translation into an under-resourced language. We use the state-of-the-art automatic text simplification (ATS) system for lexically and syntactically simplifying source sentences, which are then translated with two state-of-the-art English-to-Serbian MT systems, the phrase-based MT (PBMT) and the neural MT (NMT). We explore three different scenarios for using the ATS in MT: (1) using the raw output of the ATS; (2) automatically filtering out the sentences with low grammaticality and meaning preservation scores; and (3) performing a minimal manual correction of the ATS output. Our results show improvement in fluency of the translation regardless of the chosen scenario, and difference in success of the three scenarios depending on the MT approach used (PBMT or NMT) with regards to improving translation fluency and post-editing effort
- …