Search CORE

4,099 research outputs found

A Nested Attention Neural Hybrid Model for Grammatical Error Correction

Author: Gao Jianfeng
Gong Yongen
Ji Jianshu
Toutanova Kristina
Truong Steven
Wang Qinlong
Publication venue
Publication date: 01/01/2017
Field of study

Grammatical error correction (GEC) systems strive to correct both global errors in word order and usage, and local errors in spelling and inflection. Further developing upon recent work on neural machine translation, we propose a new hybrid neural model with nested attention layers for GEC. Experiments show that the new model can effectively correct errors of both types by incorporating word and character-level information,and that the model significantly outperforms previous neural models for GEC as measured on the standard CoNLL-14 benchmark dataset. Further analysis also shows that the superiority of the proposed model can be largely attributed to the use of the nested attention mechanism, which has proven particularly effective in correcting local errors that involve small edits in orthography

arXiv.org e-Print Archive

Crossref

A large list of confusion sets for spellchecking assessed against a corpus of real-word errors

Author: Mitton Roger
Pedler Jennifer
Publication venue
Publication date: 01/01/2010
Field of study

One of the methods that has been proposed for dealing with real-word errors (errors that occur when a correctly spelled word is substituted for the one intended) is the "confusion-set" approach - a confusion set being a small group of words that are likely to be confused with one another. Using a list of confusion sets drawn up in advance, a spellchecker, on finding one of these words in a text, can assess whether one of the other members of its set would be a better fit and, if it appears to be so, propose that word as a correction. Much of the research using this approach has suffered from two weaknesses. The first is the small number of confusion sets used. The second is that systems have largely been tested on artificial errors. In this paper we address these two weaknesses. We describe the creation of a realistically sized list of confusion sets, then the assembling of a corpus of real-word errors, and then we assess the potential of that list in relation to that corpus

CiteSeerX

Birkbeck Institutional Research Online

Misspelling Oblivious Word Embeddings

Author: Bojanowski Piotr
Edizel Bora
Ferreira Rui
Grave Edouard
Piktus Aleksandra
Silvestri Fabrizio
Publication venue
Publication date: 01/01/2019
Field of study

In this paper we present a method to learn word embeddings that are resilient to misspellings. Existing word embeddings have limited applicability to malformed texts, which contain a non-negligible amount of out-of-vocabulary words. We propose a method combining FastText with subwords and a supervised task of learning misspelling patterns. In our method, misspellings of each word are embedded close to their correct variants. We train these embeddings on a new dataset we are releasing publicly. Finally, we experimentally show the advantages of this approach on both intrinsic and extrinsic NLP tasks using public test sets.Comment: 9 Page

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Understanding Attainment Disparity:The Case for a Corpus-Driven Analysis of the Language used in Written Feedback Information to Students of Different Backgrounds

Author: Alsop Sian
Gardner Sheena
Publication venue: 'The WAC Clearinghouse'
Publication date: 01/01/2019
Field of study

Coventry University Pure Portal