Search CORE

4,140 research outputs found

Recommended from our members

Automatic Grammatical Error Detection of Non-native Spoken Learner English

Author: Caines AP
Gales MJF
Knill KM
Manakul PP
Publication venue: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Publication date: 01/01/2019
Field of study

Automatic language assessment and learning systems are required to support the global growth in English language learning. They need to be able to provide reliable and meaningful feedback to help learners develop their skills. This paper considers the question of detecting grammatical errors in non-native spoken English as a first step to providing feedback on a learner's use of the language. A state-of-the-art deep learning based grammatical error detection (GED) system designed for written texts is investigated on free speaking tasks across the full range of proficiency grades with a mix of first languages (L1s). This presents a number of challenges. Free speech contains disfluencies that disrupt the spoken language flow but are not grammatical errors. The lower the level of the learner the more these both will occur which makes the underlying task of automatic transcription harder. The baseline written GED system is seen to perform less well on manually transcribed spoken language. When the GED model is fine-tuned to free speech data from the target domain the spoken system is able to match the written performance. Given the current state-of-the-art in ASR, however, and the ability to detect disfluencies grammatical error feedback from automated transcriptions remains a challenge.This paper reports on research supported by Cambridge Assessment, University of Cambridge. Thanks to Cambridge English Language Assessment for supporting this research and providing access to the BULATS dat

Apollo (Cambridge)

GenERRate: generating errors for use in grammatical error detection

Author: Andersen Øistein E.
Foster Jennifer
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2009
Field of study

This paper explores the issue of automatically generated ungrammatical data and its use in error detection, with a focus on the task of classifying a sentence as grammatical or ungrammatical. We present an error generation tool called GenERRate and show how GenERRate can be used to improve the performance of a classifier on learner data. We describe initial attempts to replicate Cambridge Learner Corpus errors using GenERRate

CiteSeerX

Irish Universities

DCU Online Research Access Service

Automated Detection of Usage Errors in non-native English Writing

Author: Fujishima Satoru
Ishizaki Shun
Publication venue
Publication date: 26/10/2011
Field of study

In an investigation of the use of a novelty detection algorithm for identifying inappropriate word combinations in a raw English corpus, we employ an unsupervised detection algorithm based on the one- class support vector machines (OC-SVMs) and extract sentences containing word sequences whose frequency of appearance is significantly low in native English writing. Combined with n-gram language models and document categorization techniques, the OC-SVM classifier assigns given sentences into two different groups; the sentences containing errors and those without errors. Accuracies are 79.30 % with bigram model, 86.63 % with trigram model, and 34.34 % with four-gram model

EEPIS Repository

A comparative evaluation of deep and shallow approaches to the automatic detection of common grammatical errors

Author: Foster Jennifer
van Genabith Josef
Wagner Joachim
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2007
Field of study

This paper compares a deep and a shallow processing approach to the problem of classifying a sentence as grammatically wellformed or ill-formed. The deep processing approach uses the XLE LFG parser and English grammar: two versions are presented, one which uses the XLE directly to perform the classification, and another one which uses a decision tree trained on features consisting of the XLE’s output statistics. The shallow processing approach predicts grammaticality based on n-gram frequency statistics: we present two versions, one which uses frequency thresholds and one which uses a decision tree trained on the frequencies of the rarest n-grams in the input sentence. We find that the use of a decision tree improves on the basic approach only for the deep parser-based approach. We also show that combining both the shallow and deep decision tree features is effective. Our evaluation is carried out using a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting grammatical errors into well-formed BNC sentences

DCU Online Research Access Service

Judging grammaticality: experiments in sentence classification

Author: Foster Jennifer
van Genabith Josef
Wagner Joachim
Publication venue: The Computer Assisted Language Instruction Consortium
Publication date: 01/05/2009
Field of study

A classifier which is capable of distinguishing a syntactically well formed sentence from a syntactically ill formed one has the potential to be useful in an L2 language-learning context. In this article, we describe a classifier which classifies English sentences as either well formed or ill formed using information gleaned from three different natural language processing techniques. We describe the issues involved in acquiring data to train such a classifier and present experimental results for this classifier on a variety of ill formed sentences. We demonstrate that (a) the combination of information from a variety of linguistic sources is helpful, (b) the trade-off between accuracy on well formed sentences and accuracy on ill formed sentences can be fine tuned by training multiple classifiers in a voting scheme, and (c) the performance of the classifier is varied, with better performance on transcribed spoken sentences produced by less advanced language learners

DCU Online Research Access Service

Impact of ASR performance on spoken grammatical error detection

Author: Gales MJF
Knill KM
Lu Y
Manakul P
Wang L
Wang Y
Publication venue: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication date: 01/01/2019
Field of study

Computer assisted language learning (CALL) systems aidlearners to monitor their progress by providing scoring andfeedback on language assessment tasks. Free speaking tests al-low assessment of what a learner has said, as well as how theysaid it. For these tasks, Automatic Speech Recognition (ASR)is required to generate transcriptions of a candidate’s responses,the quality of these transcriptions is crucial to provide reliablefeedback in downstream processes. This paper considers theimpact of ASR performance on Grammatical Error Detection(GED) for free speaking tasks, as an example of providing feed-back on a learner’s use of English. The performance of an ad-vanced deep-learning based GED system, initially trained onwritten corpora, is used to evaluate the influence of ASR errors.One consequence of these errors is that grammatical errors canresult from incorrect transcriptions as well as learner errors, thismay yield confusing feedback. To mitigate the effect of theseerrors, and reduce erroneous feedback, ASR confidence scoresare incorporated into the GED system. By additionally adaptingthe written text GED system to the speech domain, using ASRtranscriptions, significant gains in performance can be achieved.Analysis of the GED performance for different grammatical er-ror types and across grade is also presented.ALT

Crossref

Apollo (Cambridge)

Spoken language 'grammatical error correction'

Author: Gales MJF
Lu Y
Wang Y
Publication venue: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication date: 01/01/2020
Field of study

Spoken language ‘grammatical error correction’ (GEC) is an important mechanism to help learners of a foreign language, here English, improve their spoken grammar. GEC is challeng- ing for non-native spoken language due to interruptions from disfluent speech events such as repetitions and false starts and issues in strictly defining what is acceptable in spoken language. Furthermore there is little labelled data to train models. One way to mitigate the impact of speech events is to use a disflu- ency detection (DD) model. Removing the detected disfluencies converts the speech transcript to be closer to written language, which has significantly more labelled training data. This paper considers two types of approaches to leveraging DD models to boost spoken GEC performance. One is sequential, a separately trained DD model acts as a pre-processing module providing a more structured input to the GEC model. The second approach is to train DD and GEC models in an end-to-end fashion, simul- taneously optimising both modules. Embeddings enable end- to-end models to have a richer information flow. Experimen- tal results show that DD effectively regulates GEC input; end- to-end training works well when fine-tuned on limited labelled in-domain data; and improving DD by incorporating acoustic information helps improve spoken GEC

Crossref

Apollo (Cambridge)

Impact of ASR performance on free speaking language assessment

Author: Caines AP
Gales MJF
Knill KM
Kyriakopoulos K
Malinin A
Ragni A
Wang Y
Publication venue: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication date: 01/01/2018
Field of study

In free speaking tests candidates respond in spontaneous speech to prompts. This form of test allows the spoken language proficiency of a non-native speaker of English to be assessed more fully than read aloud tests. As the candidate's responses are unscripted, transcription by automatic speech recognition (ASR) is essential for automated assessment. ASR will never be 100% accurate so any assessment system must seek to minimise and mitigate ASR errors. This paper considers the impact of ASR errors on the performance of free speaking test auto-marking systems. Firstly rich linguistically related features, based on part-of-speech tags from statistical parse trees, are investigated for assessment. Then, the impact of ASR errors on how well the system can detect whether a learner's answer is relevant to the question asked is evaluated. Finally, the impact that these errors may have on the ability of the system to provide detailed feedback to the learner is analysed. In particular, pronunciation and grammatical errors are considered as these are important in helping a learner to make progress. As feedback resulting from an ASR error would be highly confusing, an approach to mitigate this problem using confidence scores is also analysed

Crossref

Apollo (Cambridge)

White Rose Research Online