7,024 research outputs found
The Operation Sequence Model-combining N-gram-based and Phrase-based Statistical Machine Translation
An augmented three-pass system combination framework: DCU combination system for WMT 2010
This paper describes the augmented threepass
system combination framework of
the Dublin City University (DCU) MT
group for the WMT 2010 system combination
task. The basic three-pass framework
includes building individual confusion
networks (CNs), a super network, and
a modified Minimum Bayes-risk (mCon-
MBR) decoder. The augmented parts for
WMT2010 tasks include 1) a rescoring
component which is used to re-rank the
N-best lists generated from the individual
CNs and the super network, 2) a new hypothesis
alignment metric â TERp â that
is used to carry out English-targeted hypothesis
alignment, and 3) more different
backbone-based CNs which are employed
to increase the diversity of the
mConMBR decoding phase. We took
part in the combination tasks of Englishto-
Czech and French-to-English. Experimental
results show that our proposed
combination framework achieved 2.17 absolute
points (13.36 relative points) and
1.52 absolute points (5.37 relative points)
in terms of BLEU score on English-to-
Czech and French-to-English tasks respectively
than the best single system. We
also achieved better performance on human
evaluation
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
Machine translation evaluation resources and methods: a survey
We introduce the Machine Translation (MT) evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency, adequacy, comprehension, and informativeness. The advanced human assessments include task-oriented measures, post-editing, segment ranking, and extended criteriea, etc. We classify the automatic evaluation methods into two categories, including lexical similarity scenario and linguistic features application. The lexical similarity methods contain edit distance, precision, recall, F-measure, and word order. The linguistic features can be divided into syntactic features and semantic features respectively. The syntactic features include part of speech tag, phrase types and sentence structures, and the semantic features include named entity, synonyms, textual entailment, paraphrase, semantic roles, and language models. The deep learning models for evaluation are very newly proposed. Subsequently, we also introduce the evaluation methods for MT evaluation including different correlation scores, and the recent quality estimation (QE) tasks for MT.
This paper differs from the existing works\cite {GALEprogram2009, EuroMatrixProject2007} from several aspects, by introducing some recent development of MT evaluation measures, the different classifications from manual to automatic evaluation measures, the introduction of recent QE tasks of MT, and the concise construction of the content
Source-side context-informed hypothesis alignment for combining outputs from machine translation systems
This paper presents a new hypothesis alignment method for combining outputs of multiple machine translation (MT) systems. Traditional hypothesis alignment algorithms such
as TER, HMM and IHMM do not directly utilise the context information of the source side but rather address the alignment issues via the output data itself. In this paper, a source-side context-informed (SSCI) hypothesis alignment method is proposed to carry out the word alignment and word reordering issues. First of all, the sourceâtarget word alignment links are produced as the hidden variables by exporting source phrase spans during the translation decoding process. Secondly, a mapping strategy and normalisation model are employed to acquire the 1-
to-1 alignment links and build the confusion network (CN). The source-side context-based method outperforms the state-of-the-art TERbased alignment model in our experiments
on the WMT09 English-to-French and NIST Chinese-to-English data sets respectively. Experimental results demonstrate that our proposed approach scores consistently among the
best results across different data and language pair conditions
- âŠ