167,598 research outputs found
Description of the Chinese-to-Spanish rule-based machine translation system developed with a hybrid combination of human annotation and statistical techniques
Two of the most popular Machine Translation (MT) paradigms are rule based (RBMT) and corpus based, which include the statistical systems (SMT). When scarce parallel corpus is available, RBMT becomes particularly attractive. This is the case of the Chinese--Spanish language pair.
This article presents the first RBMT system for Chinese to Spanish. We describe a hybrid method for constructing this system taking advantage of available resources such as parallel corpora that are used to extract dictionaries and lexical and structural transfer rules.
The final system is freely available online and open source. Although performance lags behind standard SMT systems for an in-domain test set, the results show that the RBMTâs coverage is competitive and it outperforms the SMT system in an out-of-domain test set. This RBMT system is available to the general public, it can be further enhanced, and it opens up the possibility of creating future hybrid MT systems.Peer ReviewedPostprint (author's final draft
The impact of source-side syntactic reordering on hierarchical phrase-based SMT
Syntactic reordering has been demonstrated
to be helpful and effective for handling
different word orders between source
and target languages in SMT. However, in
terms of hierarchial PB-SMT (HPB), does
the syntactic reordering still has a significant
impact on its performance? This
paper introduces a reordering approach
which explores the { (DE) grammatical
structure in Chinese. We employ
the Stanford DE classifier to recognise
the DE structures in both training and
test sentences of Chinese, and then perform
word reordering to make the Chinese
sentences better match the word order
of English. The annotated and reordered
training data and test data are applied
to a re-implemented HPB system and
the impact of the DE construction is examined.
The experiments are conducted
on the NIST 2008 evaluation data and experimental
results show that the BLEU
and METEOR scores are significantly improved
by 1.83/8.91 and 1.17/2.73 absolute/
relative points respectively
Learning Parse and Translation Decisions From Examples With Rich Context
We present a knowledge and context-based system for parsing and translating
natural language and evaluate it on sentences from the Wall Street Journal.
Applying machine learning techniques, the system uses parse action examples
acquired under supervision to generate a deterministic shift-reduce parser in
the form of a decision structure. It relies heavily on context, as encoded in
features which describe the morphological, syntactic, semantic and other
aspects of a given parse state.Comment: 8 pages, LaTeX, 3 postscript figures, uses aclap.st
Towards String-to-Tree Neural Machine Translation
We present a simple method to incorporate syntactic information about the
target language in a neural machine translation system by translating into
linearized, lexicalized constituency trees. An experiment on the WMT16
German-English news translation task resulted in an improved BLEU score when
compared to a syntax-agnostic NMT baseline trained on the same dataset. An
analysis of the translations from the syntax-aware system shows that it
performs more reordering during translation in comparison to the baseline. A
small-scale human evaluation also showed an advantage to the syntax-aware
system.Comment: Accepted as a short paper in ACL 201
A discriminative latent variable-based "DE" classifier for ChineseâEnglish SMT
Syntactic reordering on the source-side
is an effective way of handling word order
differences. The (DE) construction
is a flexible and ubiquitous syntactic
structure in Chinese which is a major
source of error in translation quality.
In this paper, we propose a new classifier
model â discriminative latent variable
model (DPLVM) â to classify the
DE construction to improve the accuracy
of the classification and hence the translation
quality. We also propose a new feature
which can automatically learn the reordering
rules to a certain extent. The experimental
results show that the MT systems
using the data reordered by our proposed
model outperform the baseline systems
by 6.42% and 3.08% relative points
in terms of the BLEU score on PB-SMT
and hierarchical phrase-based MT respectively.
In addition, we analyse the impact
of DE annotation on word alignment and
on the SMT phrase table
- âŠ