9 research outputs found

    Building a Corpus of 2L English for Automatic Assessment: the CLEC Corpus

    Get PDF
    In this paper we describe the CLEC corpus, an ongoing project set up at the University of Cádiz with the purpose of building up a large corpus of English as a 2L classified according to CEFR proficiency levels and formed to train statistical models for automatic proficiency assessment. The goal of this corpus is twofold: on the one hand it will be used as a data resource for the development of automatic text classification systems and, on the other, it has been used as a means of teaching innovation techniques

    Towards an NLP-based approach for measuring syntactic complexity: preliminary experiments with Italian texts from different registers

    Get PDF
    In this paper, we explore how NLP can be used to automatically identify relevant syntactic complexity features in texts with the aim of assessing their correlation with specific linguistic registers. Our final goal is twofold. On the one hand, we demonstrate that automatic morpho-syntactic and syntactic annotation of texts provides sufficiently accurate output for use in the automatic extraction and measurement of syntactic complexity features. On the other hand, we identify the set of syntactic features strongly correlating with considered linguistic registers

    Evaluating Stages of Development in Second Language French: A Machine-Learning Approach

    Get PDF
    Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit. University of Tartu, Tartu, 2007. ISBN 978-9985-4-0513-0 (online) ISBN 978-9985-4-0514-7 (CD-ROM) pp. 73-80

    An Evaluation of POS Taggers for the CHILDES Corpus

    Full text link
    This project evaluates four mainstream taggers on a representative collection of child-adult’s dialogues from Child Language Data Exchange System. The nine children’s files from Valian corpora and part of Eve corpora have been manually labeled, and rewrote with LARC tagset. They served as gold standard corpora in the training and testing process. Four taggers: CLAN MOR tagger, ACOPOST trigram tagger, Stanford parser, and Ver. 1.14 of Brill tagger have been tested by 10-fold cross validation. By analyzing what kinds of assumptions the tagger made about category assignment lead to failing, we identify several problematic cases of tagging. By comparing the average error rate of each tagger, we found the size of training data set, and the length of utterance both plays a role to effect tagging accuracy

    Il ruolo delle tecnologie del linguaggio nel monitoraggio dell’evoluzione delle abilità di scrittura: primi risultati

    Get PDF
    L’ultimo decennio ha visto l’affermarsi a livello internazionale dell’uso di tecnologie del linguaggio per lo studio dei processi di apprendimento. Questo contributo riporta i primi e promettenti risultati di uno studio interdisciplinare che si è avvalso di metodi e tecniche di analisi propri della linguistica computazionale, della linguistica e della pedagogia sperimentale. Lo studio, finalizzato al monitoraggio dell’evoluzione del processo di apprendimento della lingua italiana, è stato condotto a partire dalle produzione scritte di studenti della scuola secondaria di primo grado con strumenti di annotazione linguistica automatica e di estrazione di conoscenza e ha portato all’identificazione di un insieme di tratti qualificanti il processo di apprendimento linguistico.Over the last ten years, the use of language technologies was successfully extended to the study of learning processes. The paper reports the first and promising results of an interdisciplinary study aimed at monitoring the evolution of the learning process of the Italian language based on a corpus of written productions by students and exploiting automatic linguistic annotation and knowledge extraction tools

    Are better communicators better readers? : an exploration of the connections between narrative language and reading comprehension

    Get PDF
    The association between receptive language skills and reading comprehension has been established in the research literature. Even when the importance of receptive skills for reading comprehension has been strongly supported, in practice lower levels of skills tend to go unnoticed in typically developing children. A potentially more visible modality of language, expressive skills using speech samples, has been rarely examined despite the longitudinal links between speech and later reading development, and the connections between language and reading impairments. Even fewer reading studies have examined expressive skills using a subgroup of speech samples – narrative samples – which are closer to the kind of language practitioners can observe in their classrooms, and are also a rich source of linguistic and discourse-level data in school-aged children. This thesis presents a study examining the relationship between expressive language skills in narrative samples and reading comprehension after the first two years of formal reading instruction, with considerable attention given to methodological and developmental issues. In order to address the main methodological issues surrounding the identification of the optimal linguistic indices in terms of reliability and the existence of developmental patterns, two studies of language development in oral narratives were carried out. The first of the narrative language studies drew data from an existing corpus, while the other analysed primary data, collected specifically for this purpose. Having identified the optimal narrative indices in two different samples, the main study examined the relationships between these expressive narrative measures along with receptive standardised measures, and reading comprehension in a monolingual sample of eighty 7- and 8-year-old children attending Year 3 in the UK. Both receptive and expressive oral language skills were assessed at three different levels: vocabulary, grammar and discourse. Regression analyses indicated that, when considering expressive narrative variables on their own, expressive grammar and vocabulary, in that order, contributed to explain over a fifth of reading comprehension variance in typically developing children. When controlling for receptive language however, expressive skills were not able to account for significant unique variance in the outcome measure. Nonetheless, mediation analyses revealed that receptive vocabulary and grammar played a mediating role in the relationship between expressive skills from narratives and reading comprehension. Results and further research directions are discussed in the context of this study’s methodological considerations.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Are better communicators better readers? : an exploration of the connections between narrative language and reading comprehension

    Get PDF
    The association between receptive language skills and reading comprehension has been established in the research literature. Even when the importance of receptive skills for reading comprehension has been strongly supported, in practice lower levels of skills tend to go unnoticed in typically developing children. A potentially more visible modality of language, expressive skills using speech samples, has been rarely examined despite the longitudinal links between speech and later reading development, and the connections between language and reading impairments. Even fewer reading studies have examined expressive skills using a subgroup of speech samples – narrative samples – which are closer to the kind of language practitioners can observe in their classrooms, and are also a rich source of linguistic and discourse-level data in school-aged children. This thesis presents a study examining the relationship between expressive language skills in narrative samples and reading comprehension after the first two years of formal reading instruction, with considerable attention given to methodological and developmental issues. In order to address the main methodological issues surrounding the identification of the optimal linguistic indices in terms of reliability and the existence of developmental patterns, two studies of language development in oral narratives were carried out. The first of the narrative language studies drew data from an existing corpus, while the other analysed primary data, collected specifically for this purpose. Having identified the optimal narrative indices in two different samples, the main study examined the relationships between these expressive narrative measures along with receptive standardised measures, and reading comprehension in a monolingual sample of eighty 7- and 8-year-old children attending Year 3 in the UK. Both receptive and expressive oral language skills were assessed at three different levels: vocabulary, grammar and discourse. Regression analyses indicated that, when considering expressive narrative variables on their own, expressive grammar and vocabulary, in that order, contributed to explain over a fifth of reading comprehension variance in typically developing children. When controlling for receptive language however, expressive skills were not able to account for significant unique variance in the outcome measure. Nonetheless, mediation analyses revealed that receptive vocabulary and grammar played a mediating role in the relationship between expressive skills from narratives and reading comprehension. Results and further research directions are discussed in the context of this study’s methodological considerations.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Automatic Measurement of Syntactic Development in Child Language

    No full text
    To facilitate the use of syntactic information in the study of child language acquisition, a coding scheme for Grammatical Relations (GRs) in transcripts of parent-child dialogs has been proposed by Sagae, MacWhinney and Lavie (2004). We discuss the use of current NLP techniques to produce the GRs in this annotation scheme. By using a statistical parser (Charniak, 2000) and memorybased learning tools for classification (Daelemans et al., 2004), we obtain high precision and recall of several GRs. We demonstrate the usefulness of this approach by performing automatic measurements of syntactic development with the Index of Productive Syntax (Scarborough, 1990) at similar levels to what child language researchers compute manually

    Automatic measurement of syntactic development in child language

    No full text
    To facilitate the use of syntactic information in the study of child language acquisition, a coding scheme for Grammatical Relations (GRs) in transcripts of parent-child dialogs has been proposed by Sagae, MacWhinney and Lavie (2004). We discuss the use of current NLP techniques to produce the GRs in this annotation scheme. By using a statistical parser (Charniak, 2000) and memorybased learning tools for classification (Daelemans et al., 2004), we obtain high precision and recall of several GRs. We demonstrate the usefulness of this approach by performing automatic measurements of syntactic development with the Index of Productive Syntax (Scarborough, 1990) at similar levels to what child language researchers compute manually.
    corecore