14 research outputs found
Grammatical Error Correction: A Survey of the State of the Art
Grammatical Error Correction (GEC) is the task of automatically detecting and
correcting errors in text. The task not only includes the correction of
grammatical errors, such as missing prepositions and mismatched subject-verb
agreement, but also orthographic and semantic errors, such as misspellings and
word choice errors respectively. The field has seen significant progress in the
last decade, motivated in part by a series of five shared tasks, which drove
the development of rule-based methods, statistical classifiers, statistical
machine translation, and finally neural machine translation systems which
represent the current dominant state of the art. In this survey paper, we
condense the field into a single article and first outline some of the
linguistic challenges of the task, introduce the most popular datasets that are
available to researchers (for both English and other languages), and summarise
the various methods and techniques that have been developed with a particular
focus on artificial error generation. We next describe the many different
approaches to evaluation as well as concerns surrounding metric reliability,
especially in relation to subjective human judgements, before concluding with
an overview of recent progress and suggestions for future work and remaining
challenges. We hope that this survey will serve as comprehensive resource for
researchers who are new to the field or who want to be kept apprised of recent
developments
Recommended from our members
Automatic annotation of error types for grammatical error correction
Grammatical Error Correction (GEC) is the task of automatically detecting and correcting
grammatical errors in text. Although previous work has focused on developing systems that
target specific error types, the current state of the art uses machine translation to correct all error
types simultaneously. A significant disadvantage of this approach is that machine translation
does not produce annotated output and so error type information is lost. This means we can only
evaluate a system in terms of overall performance and cannot carry out a more detailed analysis
of different aspects of system performance.
In this thesis, I develop a system to automatically annotate parallel original and corrected
sentence pairs with explicit edits and error types. In particular, I first extend the Damerau-
Levenshtein alignment algorithm to make use of linguistic information when aligning parallel
sentences, and supplement this alignment with a set of merging rules to handle multi-token
edits. The output from this algorithm surpasses other edit extraction approaches in terms of
approximating human edit annotations and is the current state of the art. Having extracted the
edits, I next classify them according to a new rule-based error type framework that depends only
on automatically obtained linguistic properties of the data, such as part-of-speech tags. This
framework was inspired by existing frameworks, and human judges rated the appropriateness
of the predicted error types as ‘Good’ (85%) or ‘Acceptable’ (10%) in a random sample of 200
edits. The whole system is called the ERRor ANnotation Toolkit (ERRANT) and is the first
toolkit capable of automatically annotating parallel sentences with error types.
I demonstrate the value of ERRANT by applying it to the system output produced by the participants of the CoNLL-2014 shared task, and carry out a detailed error type analysis of
system performance for the first time. I also develop a simple language model based approach
to GEC, that does not require annotated training data, and show how it can be improved using
ERRANT error types
Findings of the 2016 Conference on Machine Translation (WMT16)
This paper presents the results of the
WMT16 shared tasks, which included five
machine translation (MT) tasks (standard
news, IT-domain, biomedical, multimodal,
pronoun), three evaluation tasks (metrics,
tuning, run-time estimation of MT quality),
and an automatic post-editing task
and bilingual document alignment task.
This year, 102 MT systems from 24 institutions
(plus 36 anonymized online systems)
were submitted to the 12 translation
directions in the news translation task. The
IT-domain task received 31 submissions
from 12 institutions in 7 directions and the
Biomedical task received 15 submissions
systems from 5 institutions. Evaluation
was both automatic and manual (relative
ranking and 100-point scale assessments)
Findings of the 2016 Conference on Machine Translation.
This paper presents the results of the
WMT16 shared tasks, which included five
machine translation (MT) tasks (standard
news, IT-domain, biomedical, multimodal,
pronoun), three evaluation tasks (metrics,
tuning, run-time estimation of MT quality),
and an automatic post-editing task
and bilingual document alignment task.
This year, 102 MT systems from 24 institutions
(plus 36 anonymized online systems)
were submitted to the 12 translation
directions in the news translation task. The
IT-domain task received 31 submissions
from 12 institutions in 7 directions and the
Biomedical task received 15 submissions
systems from 5 institutions. Evaluation
was both automatic and manual (relative
ranking and 100-point scale assessments).
The quality estimation task had three subtasks,
with a total of 14 teams, submitting
39 entries. The automatic post-editing task
had a total of 6 teams, submitting 11 entries
Human Feedback in Statistical Machine Translation
The thesis addresses the challenge of improving Statistical Machine Translation (SMT) systems via feedback given by humans on translation quality.
The amount of human feedback available to systems is inherently low due to cost and time limitations. One of our goals is to simulate such information by automatically generating pseudo-human feedback.
This is performed using Quality Estimation (QE) models. QE is a technique for predicting the quality of automatic translations without comparing them to oracle (human) translations, traditionally at the sentence or word levels.
QE models are trained on a small collection of automatic translations manually labelled for quality, and then can predict the quality of any number of unseen translations.
We propose a number of improvements for QE models in order to increase the reliability of pseudo-human feedback.
These include strategies to artificially generate instances for settings where QE training data is scarce.
We also introduce a new level of granularity for QE: the level of phrases. This level aims to improve the quality of QE predictions by better modelling inter-dependencies among errors at word level, and in ways that are tailored to phrase-based SMT, where the basic unit of translation is a phrase. This can thus facilitate work on incorporating human feedback during the translation process.
Finally, we introduce approaches to incorporate pseudo-human feedback in the form of QE predictions in SMT systems. More specifically, we use quality predictions to select the best translation from a number of alternative suggestions produced by SMT systems, and integrate QE predictions into an SMT system decoder in order to guide the translation generation process
Automatic Proficiency Evaluation of Spoken English by Japanese Learners for Dialogue-Based Language Learning System Based on Deep Learning
Tohoku University伊藤彰則課