14 research outputs found
Cross-Lingual Lexico-Semantic Transfer in Language Learning
Lexico-semantic knowledge of our native language provides an initial foundation for second language learning. In this paper, we investigate whether and to what extent the lexico-semantic models of the native language (L1) are transferred to the second language (L2). Specifically, we focus on the problem of lexical choice and investigate it in the context of three typologically diverse languages: Russian, Spanish and English. We show that a statistical semantic model learned from L1 data improves automatic error detection in L2 for the speakers of the respective L1. Finally, we investigate whether the semantic model learned from a particular L1 is portable to other, typologically related languages.Ekaterina Kochmar’s research is supported by Cambridge English Language Assessment via the ALTA Institute. Ekaterina Shutova’s research is supported by the Leverhulme Trust Early Career Fellowship
Comparative judgments are more consistent than binary classification for labelling word complexity
© 2019 Association for Computational Linguistics Lexical simplification systems replace complex words with simple ones based on a model of which words are complex in context. We explore how users can help train complex word identification models through labelling more efficiently and reliably. We show that using an interface where annotators make comparative rather than binary judgments leads to more reliable and consistent labels, and explore whether comparative judgments may provide a faster way for collecting labels
Recommended from our members
Detecting learner errors in the choice of content words using compositional distributional semantics
We describe a novel approach to error detection in adjective-noun combinations. We present and release a new dataset of annotated errors where the examples are extracted from learner texts and annotated with error types. We show how compositional distributional semantic approaches can be applied to discriminate between correct and incorrect word combinations from learner data. Finally, we show how the output of the compositional distributional semantic models can be used as features in a classifier yielding good precision and accuracy.We are grateful to Cambridge English Language Assessment and Cambridge University Press for supporting this research and for granting us access to the CLC for research purposes
Recommended from our members
‘Calling on the classical phone’: a distributional model of adjective-noun errors in learners’ English
In this paper we discuss three key points related to error detection (ED) in learners’ English. We focus on content word ED as one of the most challenging tasks in this area, illustrating our claims on adjective–noun (AN) combinations. In particular, we (1) investigate the role of context in accurately capturing semantic anomalies and implement a system based on distributional topic coherence, which achieves state-of-the-art accuracy on a standard test set; (2) thoroughly investigate our system’s performance across individual adjective classes, concluding that a class-dependent approach is beneficial to the task; (3) discuss the data size bottleneck in this area, and highlight the challenges of automatic error generation for content words.Ekaterina Kochmar’s research is supported by Cambridge English Language Assessment via the ALTA Institute. Aurélie Herbelot’s contribution to this paper was similarly supported by ALTA
Classification of twitter accounts into automated agents and human users
© 2017 Association for Computing Machinery. Online social networks (OSNs) have seen a remarkable rise in the presence of surreptitious automated accounts. Massive human user-base and business-supportive operating model of social networks (such as Twitter) facilitates the creation of automated agents. In this paper we outline a systematic methodology and train a classifier to categorise Twitter accounts into ‘automated’ and ‘human’ users. To improve classification accuracy we employ a set of novel steps. First, we divide the dataset into four popularity bands to compensate for differences in types of accounts. Second, we create a large ground truth dataset using human annotations and extract relevant features from raw tweets. To judge accuracy of the procedure we calculate agreement among human annotators as well as with a bot detection research tool. We then apply a Random Forests classifier that achieves an accuracy close to human agreement. Finally, as a concluding step we perform tests to measure the efficacy of our results
Grammatical error correction using hybrid systems and type filtering
This paper describes our submission to the CoNLL 2014 shared task on grammatical error correction using a hybrid approach, which includes both a rule-based and an SMT system augmented by a large webbased
language model. Furthermore, we demonstrate that correction type estimation can be used to remove unnecessary corrections, improving precision without harming recall. Our best hybrid system achieves state of-the-art results, ranking first on the original test set and second on the test set with alternative annotations.[We would like to thank] Cambridge English Language Assessment, a division of Cambridge Assessment, for supporting this research
Recommended from our members
Capturing anomalies in the choice of content words in compositional distributional semantic space
In this work, we present a new task for testing compositional distributional semantic models. Recently, there has been a spate of research into how distributional representations of individual words can be combined to represent the meaning of phrases. Vecchi et al. (2011) have shown that some compositional models, including the additive and multiplicative models of Mitchell and Lapata (2008; 2010) and the linear map-based model of Baroni and Zamparelli (2010), can be applied to detect semantically anomalous adjective- noun combinations. We extend their experiments and apply these models to the combinations extracted from texts written by learners of English. Our work contributes to the field of compositional distributional semantics by introducing a new test paradigm for semantic models and shows how these models can be used for error detection in language learners' content word combinations.We are grateful to Cambridge ESOL, a division of Cambridge Assessment, and Cambridge University Press for supporting this research and for granting us access to the CLC for research purposes
SYNDROMES OF BEHAVIORAL AND SPEECH DISORDERS ASSOCIATED WITH BENIGN EPILEPTIFORM DISCHARGES OF CHILDHOOD ON ELECTROENCEPHALOGRAM
Objective: to assess the role and significance of benign epileptiform discharges of childhood (BEDC) on electroencephalogram (EEG) in development of speech and behaviorial disorders in children.Materials and methods. 90 children aged 3–7 years were included in the study: 30 of them were healthy, 30 had attention deficit hyperactivity disorder (ADHD), and 30 had expressive language disorder (ELD). We analyzed the role of persistent epileptiform activity (BEDC type) in EEG as well as frontal intermittent rhythmic delta activity in the development of some neuropsychiatric disorders and speech disorders in children.Results. We suggest to allocate a special variant of ADHD – epileptiform disintegration of behavior; we also propose the strategies for its therapeutic correction.Conclusion. Detection of epileptiform activity (BEDC type) on EEG in children with ELD is a predictor of cognitive disorders development and requires therapeutic correction, which should be aimed at stimulation of brain maturation. Detection of frontal intermittent rhythmic delta activity in children with ELD requires neurovisualization with further determining of treatment strategy