14,047 research outputs found

    Explicit vs. Implicit L2 grammar knowledge in written error correction

    Get PDF
    Error correction is undoubtedly an important part of the process of drafting and producing written texts. The aim of the paper is to analyse the learners’ ability to correct grammatical errors in relation to the type of knowledge they employ in this task. Green and Hecht (1992), in an often quoted study, found a low correlation between L2 learners’ knowledge of explicit grammar rules and their ability to correct errors. They interpret this as suggesting that in error correction, learners rely primarily on their implicit knowledge. However, certain design features of their study might have caused the subjects to simply guess the correct forms, which, in turn, as DeKeyser (2003) suggests, may have led to the overestimation of implicit knowledge. This paper reports the results of an experiment where 150 Polish learners of English were administered a corpus-based error correction task, the design of which, however, differed from that of Green and Hecht (1992). These alterations resulted in finding a much closer link between the subjects’ knowledge of rules and their ability to correct grammatical errors

    The frequency and variability of conjunctive adjuncts in the Estonian–English Interlanguage Corpus

    Get PDF
    Magistritöö eesmärgiks oli luua Eesti esimene eesti–inglise vahekeele korpus ning tutvustada selle loomise- ning uurimispõhimõtteid. Kitsamalt uuriti sidesõnade variatiivsust ning sagedust. Tulemusi analüüsiti ning seejärel võrreldi inglise keelt emakeelena kõnelevate õppijate korpusega, milleks oli Michigan Corpus of Upper–level Student Papers (Michigani kõrgeima taseme kirjalike tööde õppijakorpus). Töö koosnes neljast osast. Magistritöö esimene ja teine osa keskendusid korpuse loomise põhimõtetele ning tutvustati ka korpusuurimuse ülesehitust. Arutleti selliste aspektide olulisuse üle nagu kvantiteet, kvaliteet, dokumentatsioon ning lihtsus. Igat aspekti analüüsiti, tuues välja tugevad ja nõrgad küljed ning võimalikud kitsaskohad. Magistritöö empiirilise osa läbiviimiseks (kolmas ja neljas osa) kasutati vabatarkvara AntConc, mis võimaldas luua statistilist andmestikku, mille tulemusi hiljem analüüsiti. Uuringutulemused näitasid, et Eesti õpilased kasutavad erinevaid sidesõnu, mis kuuluvad viide kategooriasse Halliday ja Hasani (1976) jaotuse järgi. Uurimustulemuste põhjal on näha, et Eesti õpilased on järjekindlad selliste sidesõnade kasutamisel nagu firstly, secondly, in conclusion ja to sum up. Uuringu käigus tuvastati järgmiste sidesõnade ülekasutus – but ja and. Sidesõna but kasutamist võib hinnata problemaatiliseks, sest õpilased eksisid korduvalt selle kasutamises (asetades sidesõna lause algusesse). Kokkuvõtteks võib öelda, et sidesõnade variatiivsuse õpetamine Eesti õpilastele aitaks kaasa koherentsuse tagamisel argumentatiivse teksti kirjutamisel. Abiks tuleks emakeelt kõnelevate õppijate korpusest sidesõnade laenamisest, sest seal oli üldine variatiivsus võrreldes eesti-inglise vahekeele korpusega suurem

    Native language identification of fluent and advanced non-native writers

    Get PDF
    This is an accepted manuscript of an article published by ACM in ACM Transactions on Asian and Low-Resource Language Information Processing in April 2020, available online: https://doi.org/10.1145/3383202 The accepted version of the publication may differ from the final published version.Native Language Identification (NLI) aims at identifying the native languages of authors by analyzing their text samples written in a non-native language. Most existing studies investigate this task for educational applications such as second language acquisition and require the learner corpora. This article performs NLI in a challenging context of the user-generated-content (UGC) where authors are fluent and advanced non-native speakers of a second language. Existing NLI studies with UGC (i) rely on the content-specific/social-network features and may not be generalizable to other domains and datasets, (ii) are unable to capture the variations of the language-usage-patterns within a text sample, and (iii) are not associated with any outlier handling mechanism. Moreover, since there is a sizable number of people who have acquired non-English second languages due to the economic and immigration policies, there is a need to gauge the applicability of NLI with UGC to other languages. Unlike existing solutions, we define a topic-independent feature space, which makes our solution generalizable to other domains and datasets. Based on our feature space, we present a solution that mitigates the effect of outliers in the data and helps capture the variations of the language-usage-patterns within a text sample. Specifically, we represent each text sample as a point set and identify the top-k stylistically similar text samples (SSTs) from the corpus. We then apply the probabilistic k nearest neighbors’ classifier on the identified top-k SSTs to predict the native languages of the authors. To conduct experiments, we create three new corpora where each corpus is written in a different language, namely, English, French, and German. Our experimental studies show that our solution outperforms competitive methods and reports more than 80% accuracy across languages.Research funded by Higher Education Commission, and Grants for Development of New Faculty Staff at Chulalongkorn University | Digital Economy Promotion Agency (# MP-62-0003) | Thailand Research Funds (MRG6180266 and MRG6280175).Published versio

    Examining Scientific Writing Styles from the Perspective of Linguistic Complexity

    Full text link
    Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. In order to uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (1) syntactic complexity, including measurements of sentence length and sentence complexity; and (2) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity.Comment: 6 figure
    corecore