Search CORE

9,917 research outputs found

GenERRate: generating errors for use in grammatical error detection

Author: Andersen Øistein E.
Foster Jennifer
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2009
Field of study

This paper explores the issue of automatically generated ungrammatical data and its use in error detection, with a focus on the task of classifying a sentence as grammatical or ungrammatical. We present an error generation tool called GenERRate and show how GenERRate can be used to improve the performance of a classifier on learner data. We describe initial attempts to replicate Cambridge Learner Corpus errors using GenERRate

CiteSeerX

Irish Universities

DCU Online Research Access Service

A comparative evaluation of deep and shallow approaches to the automatic detection of common grammatical errors

Author: Foster Jennifer
van Genabith Josef
Wagner Joachim
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2007
Field of study

This paper compares a deep and a shallow processing approach to the problem of classifying a sentence as grammatically wellformed or ill-formed. The deep processing approach uses the XLE LFG parser and English grammar: two versions are presented, one which uses the XLE directly to perform the classification, and another one which uses a decision tree trained on features consisting of the XLE’s output statistics. The shallow processing approach predicts grammaticality based on n-gram frequency statistics: we present two versions, one which uses frequency thresholds and one which uses a decision tree trained on the frequencies of the rarest n-grams in the input sentence. We find that the use of a decision tree improves on the basic approach only for the deep parser-based approach. We also show that combining both the shallow and deep decision tree features is effective. Our evaluation is carried out using a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting grammatical errors into well-formed BNC sentences

DCU Online Research Access Service

On the Similarities Between Native, Non-native and Translated Texts

Author: Nisioi Sergiu
Ordan Noam
Rabinovich Ella
Wintner Shuly
Publication venue
Publication date: 01/01/2016
Field of study

We present a computational analysis of three language varieties: native, advanced non-native, and translation. Our goal is to investigate the similarities and differences between non-native language productions and translations, contrasting both with native language. Using a collection of computational methods we establish three main results: (1) the three types of texts are easily distinguishable; (2) non-native language and translations are closer to each other than each of them is to native language; and (3) some of these characteristics depend on the source or native language, while others do not, reflecting, perhaps, unified principles that similarly affect translations and non-native language.Comment: ACL2016, 12 page

arXiv.org e-Print Archive

Crossref

Treebanks gone bad: generating a treebank of ungrammatical English

Author: Foster Jennifer
Publication venue
Publication date: 01/01/2007
Field of study

This paper describes how a treebank of ungrammatical sentences can be created from a treebank of well-formed sentences. The treebank creation procedure involves the automatic introduction of frequently occurring grammatical errors into the sentences in an existing treebank, and the minimal transformation of the analyses in the treebank so that they describe the newly created ill-formed sentences. Such a treebank can be used to test how well a parser is able to ignore grammatical errors in texts (as people can), and can be used to induce a grammar capable of analysing such sentences. This paper also demonstrates the first of these uses

DCU Online Research Access Service

Proceedings of the LREC workshop on partial parsing : between chunk parsing and deep parsing

Author: Kübler Sandra
Piskorski Jakub
Przepiorkowski Adam
Publication venue
Publication date: 03/11/2008
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

Spoken language 'grammatical error correction'

Author: Gales MJF
Lu Y
Wang Y
Publication venue: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication date: 01/01/2020
Field of study

Spoken language ‘grammatical error correction’ (GEC) is an important mechanism to help learners of a foreign language, here English, improve their spoken grammar. GEC is challeng- ing for non-native spoken language due to interruptions from disfluent speech events such as repetitions and false starts and issues in strictly defining what is acceptable in spoken language. Furthermore there is little labelled data to train models. One way to mitigate the impact of speech events is to use a disflu- ency detection (DD) model. Removing the detected disfluencies converts the speech transcript to be closer to written language, which has significantly more labelled training data. This paper considers two types of approaches to leveraging DD models to boost spoken GEC performance. One is sequential, a separately trained DD model acts as a pre-processing module providing a more structured input to the GEC model. The second approach is to train DD and GEC models in an end-to-end fashion, simul- taneously optimising both modules. Embeddings enable end- to-end models to have a richer information flow. Experimen- tal results show that DD effectively regulates GEC input; end- to-end training works well when fine-tuned on limited labelled in-domain data; and improving DD by incorporating acoustic information helps improve spoken GEC

Crossref

Apollo (Cambridge)

On the automaticity of language processing

Author: Hartsuiker Robert
Moors Agnes
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2017
Field of study

People speak and listen to language all the time. Given this high frequency of use, it is often suggested that at least some aspects of language processing are highly overlearned and therefore occur “automatically”. Here we critically examine this suggestion. We first sketch a framework that views automaticity as a set of interrelated features of mental processes and a matter of degree rather than a single feature that is all-or-none. We then apply this framework to language processing. To do so, we carve up the processes involved in language use according to (a) whether language processing takes place in monologue or dialogue, (b) whether the individual is comprehending or producing language, (c) whether the spoken or written modality is used, and (d) the linguistic processing level at which they occur, that is, phonology, the lexicon, syntax, or conceptual processes. This exercise suggests that while conceptual processes are relatively non-automatic (as is usually assumed), there is also considerable evidence that syntactic and lexical lower-level processes are not fully automatic. We close by discussing entrenchment as a set of mechanisms underlying automatization

Crossref

Ghent University Academic Bibliography