314 research outputs found
dynamically shaping the reordering search space of phrase based statistical machine translation
Defining the reordering search space is a crucial issue in phrase-based SMT between distant languages. In fact, the optimal trade-off between accuracy and complexity of decoding is nowadays reached by harshly limiting the input permutation space. We propose a method to dynamically shape such space and, thus, capture long-range word movements without hurting translation quality nor decoding time. The space defined by loose reordering constraints is dynamically pruned through a binary classifier that predicts whether a given input word should be translated right after another. The integration of this model into a phrase-based decoder improves a strong Arabic-English baseline already including state-of-the-art early distortion cost (Moore and Quirk, 2007) and hierarchical phrase orientation models (Galley and Manning, 2008). Significant improvements in the reordering of verbs are achieved by a system that is notably faster than the baseline, while bleu and meteor remain stable, or even increase, at a very high distortion limit
A Survey of Paraphrasing and Textual Entailment Methods
Paraphrasing methods recognize, generate, or extract phrases, sentences, or
longer natural language expressions that convey almost the same information.
Textual entailment methods, on the other hand, recognize, generate, or extract
pairs of natural language expressions, such that a human who reads (and trusts)
the first element of a pair would most likely infer that the other element is
also true. Paraphrasing can be seen as bidirectional textual entailment and
methods from the two areas are often similar. Both kinds of methods are useful,
at least in principle, in a wide range of natural language processing
applications, including question answering, summarization, text generation, and
machine translation. We summarize key ideas from the two areas by considering
in turn recognition, generation, and extraction methods, also pointing to
prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of
Informatics, Athens University of Economics and Business, Greece, 201
Syntax-based machine translation using dependency grammars and discriminative machine learning
Machine translation underwent huge improvements since the groundbreaking
introduction of statistical methods in the early 2000s, going from very
domain-specific systems that still performed relatively poorly despite the
painstakingly crafting of thousands of ad-hoc rules, to general-purpose
systems automatically trained on large collections of bilingual texts which
manage to deliver understandable translations that convey the general
meaning of the original input.
These approaches however still perform quite below the level of human
translators, typically failing to convey detailed meaning and register, and
producing translations that, while readable, are often ungrammatical and
unidiomatic.
This quality gap, which is considerably large compared to most other
natural language processing tasks, has been the focus of the research in
recent years, with the development of increasingly sophisticated models that
attempt to exploit the syntactical structure of human languages, leveraging
the technology of statistical parsers, as well as advanced machine learning
methods such as marging-based structured prediction algorithms and neural
networks.
The translation software itself became more complex in order to accommodate
for the sophistication of these advanced models: the main translation
engine (the decoder) is now often combined with a pre-processor which
reorders the words of the source sentences to a target language word order, or
with a post-processor that ranks and selects a translation according according
to fine model from a list of candidate translations generated by a coarse
model.
In this thesis we investigate the statistical machine translation problem
from various angles, focusing on translation from non-analytic languages
whose syntax is best described by fluid non-projective dependency grammars
rather than the relatively strict phrase-structure grammars or projectivedependency
grammars which are most commonly used in the literature.
We propose a framework for modeling word reordering phenomena
between language pairs as transitions on non-projective source dependency
parse graphs. We quantitatively characterize reordering phenomena for the
German-to-English language pair as captured by this framework, specifically
investigating the incidence and effects of the non-projectivity of source
syntax and the non-locality of word movement w.r.t. the graph structure.
We evaluated several variants of hand-coded pre-ordering rules in order to
assess the impact of these phenomena on translation quality.
We propose a class of dependency-based source pre-ordering approaches
that reorder sentences based on a flexible models trained by SVMs and and
several recurrent neural network architectures.
We also propose a class of translation reranking models, both syntax-free
and source dependency-based, which make use of a type of neural networks
known as graph echo state networks which is highly flexible and requires
extremely little training resources, overcoming one of the main limitations
of neural network models for natural language processing tasks
Recommended from our members
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as robot control, time series prediction, and bioinformatics. Recent advances in contextual word embeddings like BERT boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand.
At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This work can be understood as an antithesis to this paradigm. We show how traditional symbolic statistical machine translation models can still improve neural machine translation (NMT) while reducing the risk for common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural grammatical error correction. We also focus on language models that often do not play a role in vanilla end-to-end approaches and apply them in different ways to word reordering, grammatical error correction, low-resource NMT, and document-level NMT. Finally, we demonstrate the benefit of hierarchical models in sequence-to-sequence prediction. Hand-engineered covering grammars are effective in preventing catastrophic errors in neural text normalization systems. Our operation sequence model for interpretable NMT represents translation as a series of actions that modify the translation state, and can also be seen as derivation in a formal grammar.EPSRC grant EP/L027623/1
EPSRC Tier-2 capital grant EP/P020259/
Probabilistic Modelling of Morphologically Rich Languages
This thesis investigates how the sub-structure of words can be accounted for
in probabilistic models of language. Such models play an important role in
natural language processing tasks such as translation or speech recognition,
but often rely on the simplistic assumption that words are opaque symbols. This
assumption does not fit morphologically complex language well, where words can
have rich internal structure and sub-word elements are shared across distinct
word forms.
Our approach is to encode basic notions of morphology into the assumptions of
three different types of language models, with the intention that leveraging
shared sub-word structure can improve model performance and help overcome data
sparsity that arises from morphological processes.
In the context of n-gram language modelling, we formulate a new Bayesian
model that relies on the decomposition of compound words to attain better
smoothing, and we develop a new distributed language model that learns vector
representations of morphemes and leverages them to link together
morphologically related words. In both cases, we show that accounting for word
sub-structure improves the models' intrinsic performance and provides benefits
when applied to other tasks, including machine translation.
We then shift the focus beyond the modelling of word sequences and consider
models that automatically learn what the sub-word elements of a given language
are, given an unannotated list of words. We formulate a novel model that can
learn discontiguous morphemes in addition to the more conventional contiguous
morphemes that most previous models are limited to. This approach is
demonstrated on Semitic languages, and we find that modelling discontiguous
sub-word structures leads to improvements in the task of segmenting words into
their contiguous morphemes.Comment: DPhil thesis, University of Oxford, submitted and accepted 2014.
http://ora.ox.ac.uk/objects/uuid:8df7324f-d3b8-47a1-8b0b-3a6feb5f45c
Recommended from our members
Exact and Approximate Methods for Machine Translation Decoding
Statistical methods have been the major force driving the advance of machine translation in recent years. Complex models are designed to improve translation performance, but the added complexity also makes decoding more challenging. In this thesis, we focus on designing exact and approximate algorithms for machine translation decoding. More specifically, we will discuss the decoding problems for phrase-based translation models and bidirectional word alignment.
The techniques explored in this thesis are Lagrangian relaxation and local search. Lagrangian relaxation based algorithms give us exact methods that have formal guarantees while being efficient in practice. We study extensions to Lagrangian relaxation that improve the convergence rate on machine translation decoding problems. The extensions include a tightening technique that adds constraints incrementally, optimality-preserving pruning to manage the search space size and utilizing the bounding properties of Lagrangian relaxation to develop an exact beam search algorithm. In addition to having the potential to improve translation accuracy, exact decoding deepens our understanding of the model that we are using, since it separates model errors from optimization errors.
This leads to the question of designing models that improve the translation quality. We design a syntactic phrase-based model that incorporates a dependency language model to evaluate the fluency level of the target language. By employing local search, an approximate method, to decode this richer model, we discuss the trade-off between the complexity of a model and the decoding efficiency with the model
Translation-based Ranking in Cross-Language Information Retrieval
Today's amount of user-generated, multilingual textual data generates the necessity for information processing
systems, where cross-linguality, i.e the ability to work on more than one
language, is fully integrated into the underlying models. In the particular
context of Information Retrieval (IR), this amounts to rank and retrieve relevant
documents from a large repository in language A, given a user's information
need expressed in a query in language B. This kind of application is commonly
termed a Cross-Language Information Retrieval (CLIR) system. Such
CLIR systems typically involve a translation component of varying complexity,
which is responsible for translating the user input into the document
language. Using query translations from modern, phrase-based Statistical
Machine Translation (SMT) systems, and subsequently retrieving monolingually
is thus a straightforward choice. However, the amount of work committed to
integrate such SMT models into CLIR, or even jointly model translation and
retrieval, is rather small.
In this thesis, I focus on the shared aspect of ranking in translation-based
CLIR: Both, translation and retrieval models, induce rankings over a set of
candidate structures through assignment of scores. The subject of this thesis
is to exploit this commonality in three different ranking tasks: (1) "Mate-ranking" refers to the
task of mining comparable data for SMT domain adaptation through translation-based
CLIR. "Cross-lingual mates" are direct or close translations of the query.
I will show that such a CLIR system is able to find
in-domain comparable data from noisy user-generated corpora and improves
in-domain translation performance of an SMT system. Conversely, the CLIR system
relies itself on a translation model that is tailored for retrieval. This
leads to the second direction of research, in which I develop two ways to
optimize an SMT model for retrieval, namely (2) by SMT parameter optimization
towards a retrieval objective ("translation ranking"), and (3) by presenting
a joint model of translation and retrieval for "document ranking". The latter
abandons the common architecture of modeling both components separately. The
former task refers to optimizing for preference of
translation candidates that work well for retrieval. In the core task of "document ranking" for CLIR, I present a model that directly ranks documents using an SMT decoder. I present substantial improvements
over state-of-the-art translation-based CLIR baseline systems, indicating that
a joint model of translation and retrieval is a promising direction of
research in the field of CLIR
- …