14,047 research outputs found
Explicit vs. Implicit L2 grammar knowledge in written error correction
Error correction is undoubtedly an important part of the process of drafting and producing written texts. The aim of the paper is to analyse the learners’ ability to correct grammatical errors in relation to the type of knowledge they employ in this task. Green and Hecht (1992), in an often quoted study, found a low correlation between L2 learners’ knowledge of explicit grammar rules and their ability to correct errors. They interpret this as suggesting that in error correction, learners rely primarily on their implicit knowledge. However, certain design features of their study might have caused the subjects to simply guess the correct forms, which, in turn, as DeKeyser (2003) suggests, may have led to the overestimation of implicit knowledge. This paper reports the results of an experiment where 150 Polish learners of English were administered a corpus-based error correction task, the design of which, however, differed from that of Green and Hecht (1992). These alterations resulted in finding a much closer link between the subjects’ knowledge of rules and their ability to correct grammatical errors
The frequency and variability of conjunctive adjuncts in the Estonian–English Interlanguage Corpus
Magistritöö eesmärgiks oli luua Eesti esimene eesti–inglise vahekeele korpus ning
tutvustada selle loomise- ning uurimispõhimõtteid. Kitsamalt uuriti sidesõnade
variatiivsust ning sagedust. Tulemusi analüüsiti ning seejärel võrreldi inglise keelt
emakeelena kõnelevate õppijate korpusega, milleks oli Michigan Corpus of Upper–level
Student Papers (Michigani kõrgeima taseme kirjalike tööde õppijakorpus).
Töö koosnes neljast osast. Magistritöö esimene ja teine osa keskendusid korpuse
loomise põhimõtetele ning tutvustati ka korpusuurimuse ülesehitust. Arutleti selliste
aspektide olulisuse üle nagu kvantiteet, kvaliteet, dokumentatsioon ning lihtsus. Igat
aspekti analüüsiti, tuues välja tugevad ja nõrgad küljed ning võimalikud kitsaskohad.
Magistritöö empiirilise osa läbiviimiseks (kolmas ja neljas osa) kasutati
vabatarkvara AntConc, mis võimaldas luua statistilist andmestikku, mille tulemusi hiljem
analüüsiti. Uuringutulemused näitasid, et Eesti õpilased kasutavad erinevaid sidesõnu, mis
kuuluvad viide kategooriasse Halliday ja Hasani (1976) jaotuse järgi.
Uurimustulemuste põhjal on näha, et Eesti õpilased on järjekindlad selliste
sidesõnade kasutamisel nagu firstly, secondly, in conclusion ja to sum up. Uuringu käigus
tuvastati järgmiste sidesõnade ülekasutus – but ja and. Sidesõna but kasutamist võib
hinnata problemaatiliseks, sest õpilased eksisid korduvalt selle kasutamises (asetades
sidesõna lause algusesse).
Kokkuvõtteks võib öelda, et sidesõnade variatiivsuse õpetamine Eesti õpilastele
aitaks kaasa koherentsuse tagamisel argumentatiivse teksti kirjutamisel. Abiks tuleks
emakeelt kõnelevate õppijate korpusest sidesõnade laenamisest, sest seal oli üldine
variatiivsus võrreldes eesti-inglise vahekeele korpusega suurem
Native language identification of fluent and advanced non-native writers
This is an accepted manuscript of an article published by ACM in ACM Transactions on Asian and Low-Resource Language Information Processing in April 2020, available online: https://doi.org/10.1145/3383202
The accepted version of the publication may differ from the final published version.Native Language Identification (NLI) aims at identifying the native languages of authors by analyzing their text samples written in a non-native language. Most existing studies investigate this task for educational applications such as second language acquisition and require the learner corpora. This article performs NLI in a challenging context of the user-generated-content (UGC) where authors are fluent and advanced non-native speakers of a second language. Existing NLI studies with UGC (i) rely on the content-specific/social-network features and may not be generalizable to other domains and datasets, (ii) are unable to capture the variations of the language-usage-patterns within a text sample, and (iii) are not associated with any outlier handling mechanism. Moreover, since there is a sizable number of people who have acquired non-English second languages due to the economic and immigration policies, there is a need to gauge the applicability of NLI with UGC to other languages. Unlike existing solutions, we define a topic-independent feature space, which makes our solution generalizable to other domains and datasets. Based on our feature space, we present a solution that mitigates the effect of outliers in the data and helps capture the variations of the language-usage-patterns within a text sample. Specifically, we represent each text sample as a point set and identify the top-k stylistically similar text samples (SSTs) from the corpus. We then apply the probabilistic k nearest neighbors’ classifier on the identified top-k SSTs to predict the native languages of the authors. To conduct experiments, we create three new corpora where each corpus is written in a different language, namely, English, French, and German. Our experimental studies show that our solution outperforms competitive methods and reports more than 80% accuracy across languages.Research funded by Higher Education Commission, and Grants for Development of New Faculty Staff at Chulalongkorn University | Digital Economy Promotion Agency (# MP-62-0003) | Thailand Research Funds (MRG6180266 and MRG6280175).Published versio
Examining Scientific Writing Styles from the Perspective of Linguistic Complexity
Publishing articles in high-impact English journals is difficult for scholars
around the world, especially for non-native English-speaking scholars (NNESs),
most of whom struggle with proficiency in English. In order to uncover the
differences in English scientific writing between native English-speaking
scholars (NESs) and NNESs, we collected a large-scale data set containing more
than 150,000 full-text articles published in PLoS between 2006 and 2015. We
divided these articles into three groups according to the ethnic backgrounds of
the first and corresponding authors, obtained by Ethnea, and examined the
scientific writing styles in English from a two-fold perspective of linguistic
complexity: (1) syntactic complexity, including measurements of sentence length
and sentence complexity; and (2) lexical complexity, including measurements of
lexical diversity, lexical density, and lexical sophistication. The
observations suggest marginal differences between groups in syntactical and
lexical complexity.Comment: 6 figure
- …