1,327 research outputs found
Improved phrase-based SMT with syntactic reordering patterns learned from lattice scoring
In this paper, we present a novel approach to incorporate source-side syntactic reordering patterns into phrase-based SMT. The main contribution of this work is to use the lattice scoring approach to exploit and utilize reordering
information that is favoured by the baseline PBSMT system. By referring to the parse trees of the training corpus, we represent the observed reorderings with source-side
syntactic patterns. The extracted patterns are then used to convert the parsed inputs into word lattices, which contain both the original source sentences and their potential reorderings. Weights of the word lattices are estimated from the observations of the syntactic reordering patterns in the training corpus. Finally, the PBSMT system is tuned
and tested on the generated word lattices to show the benefits of adding potential sourceside reorderings in the inputs. We confirmed the effectiveness of our proposed method on a medium-sized corpus for Chinese-English
machine translation task. Our method outperformed the baseline system by 1.67% relative on a randomly selected testset and 8.56% relative on the NIST 2008 testset in terms of BLEU score
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
Source-side syntactic reordering patterns with functional words for improved phrase-based SMT
Inspired by previous source-side syntactic reordering methods for SMT, this paper focuses on using automatically learned syntactic reordering patterns with functional words which indicate structural reorderings between the source and target language. This approach takes advantage of phrase alignments and source-side parse trees for pattern extraction, and then filters out those patterns without functional words. Word lattices transformed by the generated patterns are fed into PBSMT systems to incorporate potential reorderings from the inputs. Experiments are carried out on a medium-sized corpus for a ChineseâEnglish SMT task. The proposed method outperforms the baseline system by 1.38% relative on a randomly selected testset and 10.45% relative on the NIST 2008 testset in terms of BLEU score. Furthermore, a system with just 61.88% of the patterns filtered by functional words obtains a comparable performance with the unfiltered one on the randomly selected testset, and achieves 1.74% relative improvements on the NIST 2008 testset
The impact of source-side syntactic reordering on hierarchical phrase-based SMT
Syntactic reordering has been demonstrated
to be helpful and effective for handling
different word orders between source
and target languages in SMT. However, in
terms of hierarchial PB-SMT (HPB), does
the syntactic reordering still has a significant
impact on its performance? This
paper introduces a reordering approach
which explores the { (DE) grammatical
structure in Chinese. We employ
the Stanford DE classifier to recognise
the DE structures in both training and
test sentences of Chinese, and then perform
word reordering to make the Chinese
sentences better match the word order
of English. The annotated and reordered
training data and test data are applied
to a re-implemented HPB system and
the impact of the DE construction is examined.
The experiments are conducted
on the NIST 2008 evaluation data and experimental
results show that the BLEU
and METEOR scores are significantly improved
by 1.83/8.91 and 1.17/2.73 absolute/
relative points respectively
A discriminative latent variable-based "DE" classifier for ChineseâEnglish SMT
Syntactic reordering on the source-side
is an effective way of handling word order
differences. The (DE) construction
is a flexible and ubiquitous syntactic
structure in Chinese which is a major
source of error in translation quality.
In this paper, we propose a new classifier
model â discriminative latent variable
model (DPLVM) â to classify the
DE construction to improve the accuracy
of the classification and hence the translation
quality. We also propose a new feature
which can automatically learn the reordering
rules to a certain extent. The experimental
results show that the MT systems
using the data reordered by our proposed
model outperform the baseline systems
by 6.42% and 3.08% relative points
in terms of the BLEU score on PB-SMT
and hierarchical phrase-based MT respectively.
In addition, we analyse the impact
of DE annotation on word alignment and
on the SMT phrase table
Combining data-driven MT systems for improved sign language translation
In this paper, we investigate the feasibility of combining two data-driven machine translation (MT) systems for the translation of sign languages (SLs). We take the MT systems of two prominent data-driven research groups, the MaTrEx system developed at DCU and the Statistical Machine
Translation (SMT) system developed at RWTH Aachen University, and apply their respective approaches to the task of translating Irish Sign Language and German Sign Language into English and German. In a set of experiments supported by automatic evaluation results, we show that
there is a definite value to the prospective merging of MaTrExâs Example-Based MT chunks and distortion limit increase with RWTHâs constraint reordering
Using F-structures in machine translation evaluation
Despite a growing interest in automatic evaluation methods for Machine Translation (MT) quality, most existing automatic metrics are still limited to surface comparison of translation and reference strings. In this paper we
show how Lexical-Functional Grammar (LFG) labelled dependencies obtained from an automatic parse can be used to assess the quality of MT on a deeper linguistic level, giving as a result higher correlations with human judgements
- âŚ