3,221 research outputs found

    Learner Corpus Research Meets Chinese as a Second Language Acquisition: Achievements and Challenges

    Get PDF
    The article sheds light on Chinese Learner Corpus Research (CLCR), emphasizing advances and lacks in this field. First, the paper describes the potentials of learner corpora in the investigation of learner language. The specificity of learner corpus data compared to learner data in Second Language Acquisition (SLA) studies will be also analyzed. Second, it provides an overview of Chinese learner corpus-based research and reviews existing L2 Chinese learner corpora. The paper highlights the lack of L2 Chinese learner corpora collecting data from Italian learners and discuss the challenges and the needs of compiling L2 Chinese corpora to conduct studies on the acquisition of L2 Chinese by learners whose L1 is other than English or an Asian language. This issue is addressed by taking into account recent projects integrating the LCR methodology with L2 Chinese studies for Italian-speaking learners. Finally, the paper encourages a concrete integration between the application of the methodological framework of LCR and the implementation of the theoretical interpretation of data of SLA research in the design of Chinese acquisitional studies

    What Can Sla Learn From Contrastive Corpus Linguistics? the Case of Passive Constructions in Chinese Learner English

    Full text link
    This article seeks to demonstrate the predictive and diagnostic power of the integrated approach that combines contrastive corpus linguistics with interlanguage analysis in second language acquisition research, via a case study of passive constructions in Chinese learner English. The type of corpora used in contrastive corpus linguistics is first discussed, which is followed by a summary of the findings from a published contrastive study of passive constructions in English and Chinese based on comparable corpora of the two languages. These findings are in turn used to predict and diagnose the performance of Chinese learners of English in their use of English passives as mirrored in a sizeable Chinese learner English corpus in comparison with a comparable native English corpus

    WHAT CAN SLA LEARN FROM CONTRASTIVE CORPUS LINGUISTICS? THE CASE OF PASSIVE CONSTRUCTIONS IN CHINESE LEARNER ENGLISH

    Get PDF
    This article seeks to demonstrate the predictive and diagnostic power of the integrated approach that combines contrastive corpus linguistics with interlanguage analysis in second language acquisition research, via a case study of passive constructions in Chinese learner English. The type of corpora used in contrastive corpus linguistics is first discussed, which is followed by a summary of the findings from a published contrastive study of passive constructions in English and Chinese based on comparable corpora of the two languages. These findings are in turn used to predict and diagnose the performance of Chinese learners of English in their use of English passives as mirrored in a sizeable Chinese learner English corpus in comparison with a comparable native English corpus. Keywords: contrastive analysis, corpus, learner English, passive construction, Chinese 

    The COPLE2 Corpus: a Learner Corpus for Portuguese

    Get PDF
    We present the COPLE2 corpus, a learner corpus of Portuguese that includes written and spoken texts produced by learners of Portuguese as a second or foreign language. The corpus includes at the moment a total of 182,474 tokens and 978 texts, classified according to the CEFR scales. The original handwritten productions are transcribed in TEI compliant XML format and keep record of all the original information, such as reformulations, insertions and corrections made by the teacher, while the recordings are transcribed and aligned with EXMARaLDA. The TEITOK environment enables different views of the same document (XML, student version, corrected version), a CQP-based search interface, the POS, lemmatization and normalization of the tokens, and will soon be used for error annotation in stand-off format. The corpus has already been a source of data for phonological, lexical and syntactic interlanguage studies and will be used for a data-informed selection of language features for each proficiency level.info:eu-repo/semantics/publishedVersio

    An Analysis of Source-Side Grammatical Errors in NMT

    Full text link
    The quality of Neural Machine Translation (NMT) has been shown to significantly degrade when confronted with source-side noise. We present the first large-scale study of state-of-the-art English-to-German NMT on real grammatical noise, by evaluating on several Grammar Correction corpora. We present methods for evaluating NMT robustness without true references, and we use them for extensive analysis of the effects that different grammatical errors have on the NMT output. We also introduce a technique for visualizing the divergence distribution caused by a source-side error, which allows for additional insights.Comment: Accepted and to be presented at BlackboxNLP 201
    corecore