26,956 research outputs found
Building a Corpus of 2L English for Automatic Assessment: the CLEC Corpus
In this paper we describe the CLEC corpus, an ongoing project set up at the University of Cádiz with the purpose of building up a large corpus of English as a 2L classified according to CEFR proficiency levels and formed to train statistical models for automatic proficiency assessment. The goal of this corpus is twofold: on the one hand it will be used as a data resource for the development of automatic text classification systems and, on the other, it has been used as a means of teaching innovation techniques
Readers’ cognitive processes during IELTS reading tests: evidence from eye tracking
The research described in this report investigates readers' mental processes as they complete onscreen IELTS (International English Language Testing System) reading test items. It employs up-to-date eye tracking technology to research readers' eye movements and aims, among other things, to contribute to an understanding of the cognitive validity of reading test items (Glaser. 1991; Field forthcoming).
Participants were a group of Malaysian undergraduates (n=71) taking an onscreen test consisting of two IELTS reading passages with a total of 11 test items. The eye movements of a random sample of these participants (n=38) were tracked. Questionnaire and stimulated recall interview data were also collected, and were important in order to interpret and explain the eye tracking data.
Findings demonstrated significant differences between successful and unsuccessful test-takers on a number of dimensions, including their ability to read expeditiously (Khalifa and Weir. 2009). and their focus on particular aspects of the test items and the reading texts. This demonstrates the potential of eye tracking, in combination with post- hoc interview and questionnaire data, to offer new insights into the cognitive processes of successful and unsuccessful candidates in a reading test. It also gives unprecedented insights into the cognitive processing of successful and unsuccessful readers doing language tests.
As a consequence, the findings should be of value to teachers and learners, and also to examination boards seeking to validate and prepare reading tests, as well as psycholinguists and others interested in the cognitive processes of readers
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
A Large-Scale Comparison of Historical Text Normalization Systems
There is no consensus on the state-of-the-art approach to historical text
normalization. Many techniques have been proposed, including rule-based
methods, distance metrics, character-based statistical machine translation, and
neural encoder--decoder models, but studies have used different datasets,
different evaluation methods, and have come to different conclusions. This
paper presents the largest study of historical text normalization done so far.
We critically survey the existing literature and report experiments on eight
languages, comparing systems spanning all categories of proposed normalization
techniques, analysing the effect of training data quantity, and using different
evaluation methods. The datasets and scripts are made publicly available.Comment: Accepted at NAACL 201
- …