12 research outputs found
CoCo: A tool for automatically assessing conceptual complexity of texts
Traditional text complexity assessment usually takes into account only syntactic and lexical text complexity. The task of automatic assessment of conceptual text complexity, important for maintaining reader's interest and text adaptation for struggling readers, has only been proposed recently. In this paper, we present CoCo - a tool for automatic assessment of conceptual text complexity, based on using the current state-of-the-art unsupervised approach. We make the code and API freely available for research purposes, and describe the code and the possibility for its personalization and adaptation in details. We compare the current implementation with the state of the art, discussing the influence of the choice of entity linker on the performances of the tool. Finally, we present results obtained on two widely used text simplification corpora, discussing the full potential of the tool
Controllable Text Simplification with Explicit Paraphrasing
Text Simplification improves the readability of sentences through several
rewriting transformations, such as lexical paraphrasing, deletion, and
splitting. Current simplification systems are predominantly
sequence-to-sequence models that are trained end-to-end to perform all these
operations simultaneously. However, such systems limit themselves to mostly
deleting words and cannot easily adapt to the requirements of different target
audiences. In this paper, we propose a novel hybrid approach that leverages
linguistically-motivated rules for splitting and deletion, and couples them
with a neural paraphrasing model to produce varied rewriting styles. We
introduce a new data augmentation method to improve the paraphrasing capability
of our model. Through automatic and manual evaluations, we show that our
proposed model establishes a new state-of-the-art for the task, paraphrasing
more often than the existing systems, and can control the degree of each
simplification operation applied to the input texts
Reference-less Quality Estimation of Text Simplification Systems
International audienceThe evaluation of text simplification (TS) systems remains an open challenge. As the task has common points with machine translation (MT), TS is often evaluated using MT metrics such as BLEU. However, such metrics require high quality reference data, which is rarely available for TS. TS has the advantage over MT of being a monolingual task, which allows for direct comparisons to be made between the simplified text and its original version. In this paper, we compare multiple approaches to reference-less quality estimation of sentence-level text simplification systems, based on the dataset used for the QATS 2016 shared task. We distinguish three different dimensions: gram-maticality, meaning preservation and simplicity. We show that n-gram-based MT metrics such as BLEU and METEOR correlate the most with human judgment of grammaticality and meaning preservation, whereas simplicity is best evaluated by basic length-based metrics
A Deeper Exploration of the Standard PB-SMT Approach to Text Simplification and its Evaluation
Comunicació presentada a: the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing del 26 al 31 de juliol de 2015 a Beijing, Xina.In the last few years, there has been a growing number of studies addressing the Text Simplification (TS) task as a monolingual machine translation (MT) problem which translates from ‘original’ to ‘simple’ language. Motivated by those results, we investigate the influence of quality vs quantity of the training data on the effectiveness of such a MT approach to text simplification. We conduct 40 experiments on the aligned sentences from English Wikipedia and Simple English Wikipedia, controlling for: (1) the similarity between the original and simplified sentences in the training and development datasets, and (2) the sizes of those datasets. The results suggest that in the standard PB-SMT approach to text simplification the quality of the datasets has a greater impact on the system performance. Additionally, we point out several important differences between cross-lingual MT and monolingual MT used in text simplification, and show that BLEU is not a good measure of system performance in text simplification task.The research described in this paper was partially funded by the project SKATER-UPFTALN (TIN2012-38584-C06-03), Ministerio de Econom´ıa y Competitividad, Secretar´ıa de Estado de Investigaci´on, Desarrollo e Innovaci´on, Spain, and the project ABLE-TO-INCLUDE (CIP-ICTPSP- 2013-7/621055). Hannah B´echara is supported by the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme FP7/2007-2013/ under REA grant agreement no. 31747
A Deeper exploration of the standard PB-SMT approach to text simplification and its evaluation
Comunicació presentada a: the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing del 26 al 31 de juliol de 2015 a Beijing, Xina.In the last few years, there has been a
growing number of studies addressing the
Text Simplification (TS) task as a monolingual
machine translation (MT) problem
which translates from ‘original’ to ‘simple’
language. Motivated by those results,
we investigate the influence of quality
vs quantity of the training data on the
effectiveness of such a MT approach to
text simplification. We conduct 40 experiments
on the aligned sentences from
English Wikipedia and Simple English
Wikipedia, controlling for: (1) the similarity
between the original and simplified
sentences in the training and development
datasets, and (2) the sizes of those
datasets. The results suggest that in the
standard PB-SMT approach to text simplification
the quality of the datasets has a
greater impact on the system performance.
Additionally, we point out several important
differences between cross-lingual MT
and monolingual MT used in text simplification,
and show that BLEU is not a
good measure of system performance in
text simplification task.The research described in this paper was partially
funded by the project SKATER-UPFTALN
(TIN2012-38584-C06-03), Ministerio de
Econom´ıa y Competitividad, Secretar´ıa de Estado
de Investigaci´on, Desarrollo e Innovaci´on, Spain,
and the project ABLE-TO-INCLUDE (CIP-ICTPSP-
2013-7/621055). Hannah B´echara is supported
by the People Programme (Marie Curie Actions)
of the European Union’s Seventh Framework
Programme FP7/2007-2013/ under REA
grant agreement no. 31747