The Role of Cognate Vocabulary in CEFR-based Word-level Readability Assessment

Desmet, Piet; Fairon, Cédrick; François, Thomas; Tack, Anaïs; Vocab@Leuven International Conference

The Role of Cognate Vocabulary in CEFR-based Word-level Readability Assessment

Authors: Piet Desmet
Cédrick Fairon
Thomas François
Anaïs Tack
Vocab@Leuven International Conference
Publication date: 1 January 2019
Publisher

Abstract

Cognate vocabulary is known to have a facilitating effect on foreign language (L2) lexical development (de Groot and Keijzer, 2000; Elgort, 2013). Because of their cross-lingual semiotic transparency, cognates are known to be easier to comprehend and learn. As a result, cognate status has been considered an important feature when modeling L2 vocabulary learning (Willis and Ohashi, 2012) or when assessing L2 lexical readability (Beinborn et al., 2014). Although the latter readability-focused user study has shown a positive effect of cognates on decontextualized word comprehension, not many studies seem to have focused on how cognate vocabulary is distributed in reading texts of different L2 levels, such as reading materials found in textbooks graded along the CEFR (Common European Framework of Reference) scale (Council of Europe, 2001). Our aim is therefore to examine whether the presupposed increasing difficulty of the lexical stock attested in such texts is somehow related to cognate density. To this end, we will focus on French and Dutch L2 and will use two lexical databases, viz. FLELex (Francois et al., 2014) and NT2Lex (Tack et al., 2018), respectively. These resources have been compiled from a corpus of L2 reading materials targeted towards a specific CEFR level, including expert-written texts found in textbooks or readers. The lexicons thus describe word frequency distributions observed along the CEFR scale and therefore inform us about the lexical stock that should be understood a priori at a given level. In these CEFR-graded word distributions, cognate vocabulary in Dutch and French will be automatically identified, drawing on recent machine translation methods (Beinborn et al., 2013; Mitkov et al., 2007). As a parallel reference dataset, we will use the Dutch-French alignments of the Dutch Parallel Corpus (Paulussen et al., 2006)

Similar works

Full text

Available Versions

DIAL UCLouvain

oai:dial.uclouvain.be:boreal:2...

Last time updated on 28/11/2019