622 research outputs found

    Accurate Measurement of Lexical Sophistication in ESL with Reference to Learner Data

    Get PDF
    One commonly used measure of lexical sophistication is the Advanced Guiraud (AG; [9]), whose formula requires frequency band counts (e.g., COCA; [13]). However, the accuracy of this measure is affected by the particular 2000-word frequency list selected as the basis for its calculations [27]. For example, possible issues arise when frequency lists that are based solely on native speaker corpora are used as a target for second language (L2) learners (e.g., [8]) because the exposure frequencies for L2 learners may vary from that of native speakers. Such L2 variation from comparable native speakers may be due to first language (L1) culture, home country teaching materials, or the text types which L2 learners commonly encounter. This paper addresses the aforementioned problem through an English as a Second Language (ESL) frequency list validation. Our validation is established on two sources: (1) the New General Service List (NGSL; [4]) which is based on the Cambridge English Corpus (CEC) and (2) written data from the 4.2 million-word Pitt English Language Institute Corpus (PELIC). Using open-source data science tools and natural language processing technologies, the paper demonstrates that more distinct measurable lexical sophistication differences across levels are discernible when learner-oriented frequency lists (as compared to general corpora frequency lists) are used as part of a lexical measure such as AG. The results from this research will be useful in teaching contexts where lexical proficiency is measured or assessed, and for materials and test developers who rely on such lists as being representative of known vocabulary at different levels of proficiency. This research applies data-driven exploration of learner corpora to vocabulary acquisition and pedagogy, thus closing a loop between educational data mining and classroom applications

    Does Lexical Frequency affect rater judgement of essays? An experimental design using quantitative and qualitative data

    Get PDF
    Many correlational studies show a positive relation between written assessments of language and use of more diverse vocabulary (Lexical Diversity) and more infrequent words (Lexical Frequency). However, there have been no experimental studies that have isolated the effects of Lexical Frequency from Lexical Diversity. In the present study, 14 raters judged two versions of the same essay that differed only in Lexical Frequency. A Paired T-test showed no difference in mean scores between essays (t(13) = .396, p = .70) when the Lexical Frequency of 23.5% of Content Words were changed in a 347 word essay. Comments explaining scores given to essays showed that features other than vocabulary had a far greater influence on rater judgement. It is possible that the Lexical Frequency manipulations were not great enough to affect rater judgement, whether subliminal or conscious. Implications of these results for standardized language proficiency tests and future research in vocabulary are discussed

    Statistical modelling of lexical and syntactic complexity of postgraduate academic writing: a genre and corpus-based study of EFL,ESL, and English L1 M.A. dissertations

    Get PDF
    This research is an interdisciplinary study that adopts the principles of corpus linguistics and the methods of quantitative linguistics and statistical modelling to analyse the rhetorical sections of MA dissertations written by EFL, ESL, and English L1 postgraduate students. A discipline-specific corpus was analysed for 22 lexical and 11 syntactic complexity measures using three natural language processing tools [LCA-AW, TAALED, Coh Metrix] to find differences of academic texts by English L1 vs. L2 and to investigate the relationship between these linguistic indices. Structural factor analyses as well as the two statistical modelling methods of linear mixed-effects modelling and the supervised machine learning predictive classification modelling were then employed to verify the existing classification of the complexity indices, to explore their further dimensions, to investigate the effects of English language background and rhetorical sections on the production of lexically and syntactically complex texts, and finally to predict models that can best classify the group membership and the membership to the rhetorical sections based on the values of these measures. This investigation resulted in more than 20 specific findings with important implications for academic writing assessment of English L1 vs. L2, for academic writing research on rhetorical sections of English academic texts, for academic writing instruction especially materials development and syllabus designs in the EFL contexts, and academic immersion programmes, for the measure-testing and selection processes, and for methodological aspects of statistical modelling in corpus-based academic studies
    • …
    corecore