315 research outputs found
Recommended from our members
Refinements in hierarchical phrase-based translation systems
The relatively recently proposed hierarchical phrase-based translation model
for statistical machine translation (SMT) has achieved state-of-the-art performance
in numerous recent translation evaluations. Hierarchical phrase-based
systems comprise a pipeline of modules with complex interactions. In
this thesis, we propose refinements to the hierarchical phrase-based model
as well as improvements and analyses in various modules for hierarchical
phrase-based systems.
We took the opportunity of increasing amounts of available training data
for machine translation as well as existing frameworks for distributed computing
in order to build better infrastructure for extraction, estimation and
retrieval of hierarchical phrase-based grammars. We design and implement
grammar extraction as a series of Hadoop MapReduce jobs. We store the resulting
grammar using the HFile format, which offers competitive trade-offs
in terms of efficiency and simplicity. We demonstrate improvements over two
alternative solutions used in machine translation.
The modular nature of the SMT pipeline, while allowing individual improvements,
has the disadvantage that errors committed by one module are
propagated to the next. This thesis alleviates this issue between the word
alignment module and the grammar extraction and estimation module by
considering richer statistics from word alignment models in extraction. We
use alignment link and alignment phrase pair posterior probabilities for grammar
extraction and estimation and demonstrate translation improvements in
Chinese to English translation.
This thesis also proposes refinements in grammar and language modelling
both in the context of domain adaptation and in the context of the interaction
between first-pass decoding and lattice rescoring. We analyse alternative
strategies for grammar and language model cross-domain adaptation. We
also study interactions between first-pass and second-pass language model in terms of size and n-gram order. Finally, we analyse two smoothing methods
for large 5-gram language model rescoring.
The last two chapters are devoted to the application of phrase-based
grammars to the string regeneration task, which we consider as a means to
study the fluency of machine translation output. We design and implement a
monolingual phrase-based decoder for string regeneration and achieve state-of-the-art
performance on this task. By applying our decoder to the output
of a hierarchical phrase-based translation system, we are able to recover the
same level of translation quality as the translation system
Hierarchical Phrase-Based Translation with Suffix Arrays
A major engineering challenge in statistical machine translation systems is the efficient representation of extremely large translation rulesets. In phrase-based models, this prob-lem can be addressed by storing the training data in memory and using a suffix array as an efficient index to quickly lookup and ex-tract rules on the fly. Hierarchical phrase-based translation introduces the added wrin-kle of source phrases with gaps. Lookup algorithms used for contiguous phrases no longer apply and the best approximate pat-tern matching algorithms are much too slow, taking several minutes per sentence. We describe new lookup algorithms for hierar-chical phrase-based translation that reduce the empirical computation time by nearly two orders of magnitude, making on-the-fly lookup feasible for source phrases with gaps.
Non-linear Learning for Statistical Machine Translation
Modern statistical machine translation (SMT) systems usually use a linear
combination of features to model the quality of each translation hypothesis.
The linear combination assumes that all the features are in a linear
relationship and constrains that each feature interacts with the rest features
in an linear manner, which might limit the expressive power of the model and
lead to a under-fit model on the current data. In this paper, we propose a
non-linear modeling for the quality of translation hypotheses based on neural
networks, which allows more complex interaction between features. A learning
framework is presented for training the non-linear models. We also discuss
possible heuristics in designing the network structure which may improve the
non-linear learning performance. Experimental results show that with the basic
features of a hierarchical phrase-based machine translation system, our method
produce translations that are better than a linear model.Comment: submitted to a conferenc
Top-Rank Enhanced Listwise Optimization for Statistical Machine Translation
Pairwise ranking methods are the basis of many widely used discriminative
training approaches for structure prediction problems in natural language
processing(NLP). Decomposing the problem of ranking hypotheses into pairwise
comparisons enables simple and efficient solutions. However, neglecting the
global ordering of the hypothesis list may hinder learning. We propose a
listwise learning framework for structure prediction problems such as machine
translation. Our framework directly models the entire translation list's
ordering to learn parameters which may better fit the given listwise samples.
Furthermore, we propose top-rank enhanced loss functions, which are more
sensitive to ranking errors at higher positions. Experiments on a large-scale
Chinese-English translation task show that both our listwise learning framework
and top-rank enhanced listwise losses lead to significant improvements in
translation quality.Comment: Accepted to CONLL 201
Compositional Morphology for Word Representations and Language Modelling
This paper presents a scalable method for integrating compositional
morphological representations into a vector-based probabilistic language model.
Our approach is evaluated in the context of log-bilinear language models,
rendered suitably efficient for implementation inside a machine translation
decoder by factoring the vocabulary. We perform both intrinsic and extrinsic
evaluations, presenting results on a range of languages which demonstrate that
our model learns morphological representations that both perform well on word
similarity tasks and lead to substantial reductions in perplexity. When used
for translation into morphologically rich languages with large vocabularies,
our models obtain improvements of up to 1.2 BLEU points relative to a baseline
system using back-off n-gram models.Comment: Proceedings of the 31st International Conference on Machine Learning
(ICML
- …