26,956 research outputs found

    Building a Corpus of 2L English for Automatic Assessment: the CLEC Corpus

    Get PDF
    In this paper we describe the CLEC corpus, an ongoing project set up at the University of Cádiz with the purpose of building up a large corpus of English as a 2L classified according to CEFR proficiency levels and formed to train statistical models for automatic proficiency assessment. The goal of this corpus is twofold: on the one hand it will be used as a data resource for the development of automatic text classification systems and, on the other, it has been used as a means of teaching innovation techniques

    Readers’ cognitive processes during IELTS reading tests: evidence from eye tracking

    Get PDF
    The research described in this report investigates readers' mental processes as they complete onscreen IELTS (International English Language Testing System) reading test items. It employs up-to-date eye tracking technology to research readers' eye movements and aims, among other things, to contribute to an understanding of the cognitive validity of reading test items (Glaser. 1991; Field forthcoming). Participants were a group of Malaysian undergraduates (n=71) taking an onscreen test consisting of two IELTS reading passages with a total of 11 test items. The eye movements of a random sample of these participants (n=38) were tracked. Questionnaire and stimulated recall interview data were also collected, and were important in order to interpret and explain the eye tracking data. Findings demonstrated significant differences between successful and unsuccessful test-takers on a number of dimensions, including their ability to read expeditiously (Khalifa and Weir. 2009). and their focus on particular aspects of the test items and the reading texts. This demonstrates the potential of eye tracking, in combination with post- hoc interview and questionnaire data, to offer new insights into the cognitive processes of successful and unsuccessful candidates in a reading test. It also gives unprecedented insights into the cognitive processing of successful and unsuccessful readers doing language tests. As a consequence, the findings should be of value to teachers and learners, and also to examination boards seeking to validate and prepare reading tests, as well as psycholinguists and others interested in the cognitive processes of readers

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    A Large-Scale Comparison of Historical Text Normalization Systems

    Get PDF
    There is no consensus on the state-of-the-art approach to historical text normalization. Many techniques have been proposed, including rule-based methods, distance metrics, character-based statistical machine translation, and neural encoder--decoder models, but studies have used different datasets, different evaluation methods, and have come to different conclusions. This paper presents the largest study of historical text normalization done so far. We critically survey the existing literature and report experiments on eight languages, comparing systems spanning all categories of proposed normalization techniques, analysing the effect of training data quantity, and using different evaluation methods. The datasets and scripts are made publicly available.Comment: Accepted at NAACL 201
    corecore