7 research outputs found

    Text Readability Classification of Textbooks of a Low-Resource Language

    Get PDF

    Pratique de la lecture en thaĂŻ et hindi en L2 : classification automatique de textes par progression lexicale

    Get PDF
    International audienceThis article looks at the creation of teaching and learning resources for less commonly taught languages from unsimplified texts. The inspiration for this study comes from Ghadirian (2002) and the associated computer program TextLadder. The program classifies a series of texts by their lexical similarity, introducing target vocabulary incrementally and thus making reading easier for the learner. This kind of automated text sequencing can be used to select sequences of texts appropriate to the level of lexical competence of the L2 reader, whether for independent readers or for creating teaching material for classroom use. The method is particularly suitable for classifying texts with a similar topic or theme.Cet article a pour objet la création automatique de ressources pour l’apprentissage de langues étrangères peu enseignées et peu dotées en matériels pédagogiques à partir de textes authentiques. Il s'inspire du travail de Ghadirian (2002) et son logiciel TextLadder, une application qui classifie les textes d’un corpus selon un ordre qui maximise la facilité de lecture pour l’apprenant, en calculant la similarité lexicale entre les textes. La classification automatique de textes par progression lexicale constitue une méthode intéressante pour proposer une séquence de textes appropriée au niveau d’un lecteur en L2, aussi bien pour proposer des textes à des lecteurs autonomes que pour la création de matériels pédagogiques destinés à être utilisés en classe. Cette méthode est spécialement bien adaptée à la classification de textes qui portent sur une thématique particulière

    Age Recommendation from Texts and Sentences for Children

    Full text link
    Children have less text understanding capability than adults. Moreover, this capability differs among the children of different ages. Hence, automatically predicting a recommended age based on texts or sentences would be a great benefit to propose adequate texts to children and to help authors writing in the most appropriate way. This paper presents our recent advances on the age recommendation task. We consider age recommendation as a regression task, and discuss the need for appropriate evaluation metrics, study the use of state-of-the-art machine learning model, namely Transformers, and compare it to different models coming from the literature. Our results are also compared with recommendations made by experts. Further, this paper deals with preliminary explainability of the age prediction model by analyzing various linguistic features. We conduct the experiments on a dataset of 3, 673 French texts (132K sentences, 2.5M words). To recommend age at the text level and sentence level, our best models achieve MAE scores of 0.98 and 1.83 respectively on the test set. Also, compared to the recommendations made by experts, our sentence-level recommendation model gets a similar score to the experts, while the text-level recommendation model outperforms the experts by an MAE score of 1.48.Comment: 26 pages (incl. 4 pages for appendices), 4 figures, 20 table

    Text Readability Classification of Textbooks of a Low-Resource Language

    No full text

    An Automatic Modern Standard Arabic Text Simplification System: A Corpus-Based Approach

    Get PDF
    This thesis brings together an overview of Text Readability (TR) about Text Simplification (TS) with an application of both to Modern Standard Arabic (MSA). It will present our findings on using automatic TR and TS tools to teach MSA, along with challenges, limitations, and recommendations about enhancing the TR and TS models. Reading is one of the most vital tasks that provide language input for communication and comprehension skills. It is proved that the use of long sentences, connected sentences, embedded phrases, passive voices, non- standard word orders, and infrequent words can increase the text difficulty for people with low literacy levels, as well as second language learners. The thesis compares the use of sentence embeddings of different types (fastText, mBERT, XLM-R and Arabic-BERT), as well as traditional language features such as POS tags, dependency trees, readability scores and frequency lists for language learners. The accuracy of the 3-way CEFR (The Common European Framework of Reference for Languages Proficiency Levels) classification is F-1 of 0.80 and 0.75 for Arabic-Bert and XLM-R classification, respectively and 0.71 Spearman correlation for the regression task. At the same time, the binary difficulty classifier reaches F-1 0.94 and F-1 0.98 for the sentence-pair semantic similarity classifier. TS is an NLP task aiming to reduce the linguistic complexity of the text while maintaining its meaning and original information (Siddharthan, 2002; Camacho Collados, 2013; Saggion, 2017). The simplification study experimented using two approaches: (i) a classification approach and (ii) a generative approach. It then evaluated the effectiveness of these methods using the BERTScore (Zhang et al., 2020) evaluation metric. The simple sentences produced by the mT5 model achieved P 0.72, R 0.68 and F-1 0.70 via BERTScore while combining Arabic- BERT and fastText achieved P 0.97, R 0.97 and F-1 0.97. To reiterate, this research demonstrated the effectiveness of the implementation of a corpus-based method combined with extracting extensive linguistic features via the latest NLP techniques. It provided insights which can be of use in various Arabic corpus studies and NLP tasks such as translation for educational purposes
    corecore