59 research outputs found
An introduction to statistical methods in machine translation
The intention of this article is to provide a concise introduction to the basic mathematical concepts of statistical translation models as they were introduced by Brown et al. (1993) in their groundbreaking work The Mathematics of Statistical Machine Translation: Parameter Estimation. We concentrate on a simplified description of the first two translation models known as IBM Model 1 and 2. It is one major aim of this work to serve as tutoring material for students of computational linguistics, mathematics or computer science and therefore a lot of comments, additional examples and step-by-step explanations are given, augmenting the original formula by Brown et al. (1993). For both discussed models the calculations for a small parallel corpus are described in detail. The intention of this article is to provide a concise introduction to the basic mathematical concepts of statistical translation models as they were introduced by Brown et al. (1993) in their groundbreaking work The Mathematics of Statistical Machine Translation: Parameter Estimation. We concentrate on a simplified description of the first two translation models known as IBM Model 1 and 2. It is one major aim of this work to serve as tutoring material for students of computational linguistics, mathematics or computer science and therefore a lot of comments, additional examples and step-by-step explanations are given, augmenting the original formula by Brown et al. (1993). For both discussed models the calculations for a small parallel corpus are described in detail
A finite-state model of German compounds
This paper summarizes the results of my Master's thesis and the main points of a talk I presented at the seminar of the Department of Applied Logic at the Adam Mickiewicz University in Poznań.It gives a short overview of the structure of German compounds and newer research concerning the role of the so-called interfixes. After an introduction to the concept of finite-state transducers the construction of a transducer used for naive compound segmentation is described. Tag-based finite-state methods for the further analysis of the found segments are given and discussed. Distributional transducer rules, for the construction of which I assume the existence of local and global morphological contexts, are proposed as means of disambiguation of the analyzed naive segmentation results.This paper summarizes the results of my Master's thesis and the main points of a talk I presented at the seminar of the Department of Applied Logic at the Adam Mickiewicz University in Poznań.It gives a short overview of the structure of German compounds and newer research concerning the role of the so-called interfixes. After an introduction to the concept of finite-state transducers the construction of a transducer used for naive compound segmentation is described. Tag-based finite-state methods for the further analysis of the found segments are given and discussed. Distributional transducer rules, for the construction of which I assume the existence of local and global morphological contexts, are proposed as means of disambiguation of the analyzed naive segmentation results.
Target-Side Context for Discriminative Models in Statistical Machine Translation
Discriminative translation models utilizing source context have been shown to
help statistical machine translation performance. We propose a novel extension
of this work using target context information. Surprisingly, we show that this
model can be efficiently integrated directly in the decoding process. Our
approach scales to large training data sizes and results in consistent
improvements in translation quality on four language pairs. We also provide an
analysis comparing the strengths of the baseline source-context model with our
extended source-context and target-context model and we show that our extension
allows us to better capture morphological coherence. Our work is freely
available as part of Moses.Comment: Accepted as a long paper for ACL 201
Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation
We combine two of the most popular approaches to automated Grammatical Error
Correction (GEC): GEC based on Statistical Machine Translation (SMT) and GEC
based on Neural Machine Translation (NMT). The hybrid system achieves new
state-of-the-art results on the CoNLL-2014 and JFLEG benchmarks. This GEC
system preserves the accuracy of SMT output and, at the same time, generates
more fluent sentences as it typical for NMT. Our analysis shows that the
created systems are closer to reaching human-level performance than any other
GEC system reported so far.Comment: Accepted for oral presentation, research track, short papers, at
NAACL 201
An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing
In this work, we explore multiple neural architectures adapted for the task
of automatic post-editing of machine translation output. We focus on neural
end-to-end models that combine both inputs (raw MT output) and
(source language input) in a single neural architecture, modeling directly. Apart from that, we investigate the influence of
hard-attention models which seem to be well-suited for monolingual tasks, as
well as combinations of both ideas. We report results on data sets provided
during the WMT-2016 shared task on automatic post-editing and can demonstrate
that dual-attention models that incorporate all available data in the APE
scenario in a single model improve on the best shared task system and on all
other published results after the shared task. Dual-attention models that are
combined with hard attention remain competitive despite applying fewer changes
to the input.Comment: Accepted for presentation at IJCNLP 201
On-the-Fly Fusion of Large Language Models and Machine Translation
We propose the on-the-fly ensembling of a machine translation model with an
LLM, prompted on the same task and input. We perform experiments on 4 language
pairs (both directions) with varying data amounts. We find that a slightly
weaker-at-translation LLM can improve translations of a NMT model, and
ensembling with an LLM can produce better translations than ensembling two
stronger MT models. We combine our method with various techniques from LLM
prompting, such as in context learning and translation context
- …