51 research outputs found
An empirical analysis of phrase-based and neural machine translation
Two popular types of machine translation (MT) are phrase-based and neural
machine translation systems. Both of these types of systems are composed of
multiple complex models or layers. Each of these models and layers learns
different linguistic aspects of the source language. However, for some of these
models and layers, it is not clear which linguistic phenomena are learned or
how this information is learned. For phrase-based MT systems, it is often clear
what information is learned by each model, and the question is rather how this
information is learned, especially for its phrase reordering model. For neural
machine translation systems, the situation is even more complex, since for many
cases it is not exactly clear what information is learned and how it is
learned.
To shed light on what linguistic phenomena are captured by MT systems, we
analyze the behavior of important models in both phrase-based and neural MT
systems. We consider phrase reordering models from phrase-based MT systems to
investigate which words from inside of a phrase have the biggest impact on
defining the phrase reordering behavior. Additionally, to contribute to the
interpretability of neural MT systems we study the behavior of the attention
model, which is a key component in neural MT systems and the closest model in
functionality to phrase reordering models in phrase-based systems. The
attention model together with the encoder hidden state representations form the
main components to encode source side linguistic information in neural MT. To
this end, we also analyze the information captured in the encoder hidden state
representations of a neural MT system. We investigate the extent to which
syntactic and lexical-semantic information from the source side is captured by
hidden state representations of different neural MT architectures.Comment: PhD thesis, University of Amsterdam, October 2020.
https://pure.uva.nl/ws/files/51388868/Thesis.pd
Novel statistical approaches to text classification, machine translation and computer-assisted translation
Esta tesis presenta diversas contribuciones en los campos de la
clasificación automática de texto, traducción automática y traducción
asistida por ordenador bajo el marco estadístico.
En clasificación automática de texto, se propone una nueva aplicación
llamada clasificación de texto bilingüe junto con una serie de modelos
orientados a capturar dicha información bilingüe. Con tal fin se
presentan dos aproximaciones a esta aplicación; la primera de ellas se
basa en una asunción naive que contempla la independencia entre las
dos lenguas involucradas, mientras que la segunda, más sofisticada,
considera la existencia de una correlación entre palabras en
diferentes lenguas. La primera aproximación dió lugar al desarrollo de
cinco modelos basados en modelos de unigrama y modelos de n-gramas
suavizados. Estos modelos fueron evaluados en tres tareas de
complejidad creciente, siendo la más compleja de estas tareas
analizada desde el punto de vista de un sistema de ayuda a la
indexación de documentos. La segunda aproximación se caracteriza por
modelos de traducción capaces de capturar correlación entre palabras
en diferentes lenguas. En nuestro caso, el modelo de traducción
elegido fue el modelo M1 junto con un modelo de unigramas. Este
modelo fue evaluado en dos de las tareas más simples superando la
aproximación naive, que asume la independencia entre palabras en
differentes lenguas procedentes de textos bilingües.
En traducción automática, los modelos estadísticos de traducción
basados en palabras M1, M2 y HMM son extendidos bajo el marco de la
modelización mediante mixturas, con el objetivo de definir modelos de
traducción dependientes del contexto. Asimismo se extiende un
algoritmo iterativo de búsqueda basado en programación dinámica,
originalmente diseñado para el modelo M2, para el caso de mixturas de
modelos M2. Este algoritmo de búsqueda nCivera Saiz, J. (2008). Novel statistical approaches to text classification, machine translation and computer-assisted translation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2502Palanci
Substring-based Machine Translation
Abstract Machine translation is traditionally formulated as the transduction of strings of words from the source to the target language. As a result, additional lexical processing steps such as morphological analysis, transliteration, and tokenization are required to process the internal structure of words to help cope with data-sparsity issues that occur when simply dividing words according to white spaces. In this paper, we take a different approach: not dividing lexical processing and translation into two steps, but simply viewing translation as a single transduction between character strings in the source and target languages. In particular, we demonstrate that the key to achieving accuracies on a par with word-based translation in the character-based framework is the use of a many-to-many alignment strategy that can accurately capture correspondences between arbitrary substrings. We build on the alignment method proposed in Neubig et al (2011), improving its efficiency and accuracy with a focus on character-based translation. Using a many-to-many aligner imbued with these improvements, we demonstrate that the traditional framework of phrase-based machine translation sees large gains in accuracy over character-based translation with more naive alignment methods, and achieves comparable results to word-based translation for two distant language pairs
Statistical approaches for natural language modelling and monotone statistical machine translation
Esta tesis reune algunas contribuciones al reconocimiento de formas estadístico y, más especícamente, a varias tareas del procesamiento del lenguaje natural. Varias técnicas estadísticas bien conocidas se revisan en esta tesis, a saber: estimación paramétrica, diseño de la función de pérdida y modelado estadístico. Estas técnicas se aplican a varias tareas del procesamiento del lenguajes natural tales como clasicación de documentos, modelado del lenguaje natural
y traducción automática estadística.
En relación con la estimación paramétrica, abordamos el problema del suavizado proponiendo una nueva técnica de estimación por máxima verosimilitud con dominio restringido (CDMLEa ). La técnica CDMLE evita la necesidad de la etapa de suavizado que propicia la pérdida de las propiedades del estimador máximo verosímil. Esta técnica se aplica a clasicación de documentos mediante el clasificador Naive Bayes. Más tarde, la técnica CDMLE se extiende a la estimación por máxima verosimilitud por leaving-one-out aplicandola al suavizado de modelos de lenguaje. Los resultados obtenidos en varias tareas de modelado del lenguaje natural, muestran una mejora en términos de perplejidad.
En a la función de pérdida, se estudia cuidadosamente el diseño de funciones de pérdida diferentes a la 0-1. El estudio se centra en aquellas funciones de pérdida que reteniendo una complejidad de decodificación similar a la función 0-1, proporcionan una mayor flexibilidad. Analizamos y presentamos varias funciones de pérdida en varias tareas de traducción automática y con varios modelos de traducción. También, analizamos algunas reglas de traducción que destacan por causas prácticas tales como la regla de traducción directa; y, así mismo, profundizamos en la comprensión de los modelos log-lineares, que son de hecho, casos particulares de funciones de pérdida.
Finalmente, se proponen varios modelos de traducción monótonos basados en técnicas de modelado estadístico .Andrés Ferrer, J. (2010). Statistical approaches for natural language modelling and monotone statistical machine translation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/7109Palanci
Models, Inference, and Implementation for Scalable Probabilistic Models of Text
Unsupervised probabilistic Bayesian models are powerful tools for statistical analysis, especially in the area of information retrieval, document analysis and text processing. Despite their success, unsupervised probabilistic Bayesian models are often slow in inference due to inter-entangled mutually dependent latent variables. In addition, the parameter space of these models is usually very large. As the data from various different media sources--for example, internet, electronic books, digital films, etc--become widely accessible, lack of scalability for these unsupervised probabilistic Bayesian models becomes a critical bottleneck.
The primary focus of this dissertation is to speed up the inference process in unsupervised probabilistic Bayesian models. There are two common solutions to scale the algorithm up to large data: parallelization or streaming. The former achieves scalability by distributing the data and the computation to multiple machines. The latter assumes data come in a stream and updates the model gradually after seeing each data observation. It is able to scale to larger datasets because it usually takes
only one pass over the entire data.
In this dissertation, we examine both approaches. We first demonstrate the effectiveness of the parallelization approach on a class of unsupervised Bayesian models--topic models, which are exemplified by latent Dirichlet allocation (LDA). We propose a fast parallel implementation using variational inference on the MapRe- duce framework, referred to as Mr. LDA. We show that parallelization enables topic models to handle significantly larger datasets. We further show that our implementation--unlike highly tuned and specialized implementations--is easily extensible. We demonstrate two extensions possible with this scalable framework: 1) informed priors to guide topic discovery and 2) extracting topics from a multilingual corpus.
We propose polylingual tree-based topic models to infer topics in multilingual corpora. We then propose three different inference methods to infer the latent variables. We examine the effectiveness of different inference methods on the task of machine translation in which we use the proposed model to extract domain knowledge that considers both source and target languages. We apply it on a large collection of aligned Chinese-English sentences and show that our model yields significant improvement on BLEU score over strong baselines.
Other than parallelization, another approach to deal with scalability is to learn parameters in an online streaming setting. Although many online algorithms have been proposed for LDA, they all overlook a fundamental but challenging problem-- the vocabulary is constantly evolving over time. To address this problem, we propose
an online LDA with infinite vocabulary--infvoc LDA. We derive online hybrid inference for our model and propose heuristics to dynamically order, expand, and contract the set of words in our vocabulary. We show that our algorithm is able to discover better topics by incorporating new words into the vocabulary and constantly refining the topics over time.
In addition to LDA, we also show generality of the online hybrid inference framework by applying it to adaptor grammars, which are a broader class of models subsuming LDA. With proper grammar rules, it simplifies to the exact LDA model, however, it provides more flexibility to alter or extend LDA with different grammar rules. We develop online hybrid inference for adaptor grammar, and show that our method discovers high-quality structure more quickly than both MCMC and variational inference methods
Syntax-based machine translation using dependency grammars and discriminative machine learning
Machine translation underwent huge improvements since the groundbreaking
introduction of statistical methods in the early 2000s, going from very
domain-specific systems that still performed relatively poorly despite the
painstakingly crafting of thousands of ad-hoc rules, to general-purpose
systems automatically trained on large collections of bilingual texts which
manage to deliver understandable translations that convey the general
meaning of the original input.
These approaches however still perform quite below the level of human
translators, typically failing to convey detailed meaning and register, and
producing translations that, while readable, are often ungrammatical and
unidiomatic.
This quality gap, which is considerably large compared to most other
natural language processing tasks, has been the focus of the research in
recent years, with the development of increasingly sophisticated models that
attempt to exploit the syntactical structure of human languages, leveraging
the technology of statistical parsers, as well as advanced machine learning
methods such as marging-based structured prediction algorithms and neural
networks.
The translation software itself became more complex in order to accommodate
for the sophistication of these advanced models: the main translation
engine (the decoder) is now often combined with a pre-processor which
reorders the words of the source sentences to a target language word order, or
with a post-processor that ranks and selects a translation according according
to fine model from a list of candidate translations generated by a coarse
model.
In this thesis we investigate the statistical machine translation problem
from various angles, focusing on translation from non-analytic languages
whose syntax is best described by fluid non-projective dependency grammars
rather than the relatively strict phrase-structure grammars or projectivedependency
grammars which are most commonly used in the literature.
We propose a framework for modeling word reordering phenomena
between language pairs as transitions on non-projective source dependency
parse graphs. We quantitatively characterize reordering phenomena for the
German-to-English language pair as captured by this framework, specifically
investigating the incidence and effects of the non-projectivity of source
syntax and the non-locality of word movement w.r.t. the graph structure.
We evaluated several variants of hand-coded pre-ordering rules in order to
assess the impact of these phenomena on translation quality.
We propose a class of dependency-based source pre-ordering approaches
that reorder sentences based on a flexible models trained by SVMs and and
several recurrent neural network architectures.
We also propose a class of translation reranking models, both syntax-free
and source dependency-based, which make use of a type of neural networks
known as graph echo state networks which is highly flexible and requires
extremely little training resources, overcoming one of the main limitations
of neural network models for natural language processing tasks
Probabilistic Inference for Phrase-based Machine Translation: A Sampling Approach
Recent advances in statistical machine translation (SMT) have used dynamic programming
(DP) based beam search methods for approximate inference within probabilistic
translation models. Despite their success, these methods compromise the probabilistic
interpretation of the underlying model thus limiting the application of probabilistically
defined decision rules during training and decoding.
As an alternative, in this thesis, we propose a novel Monte Carlo sampling approach
for theoretically sound approximate probabilistic inference within these models. The
distribution we are interested in is the conditional distribution of a log-linear translation
model; however, often, there is no tractable way of computing the normalisation term
of the model. Instead, a Gibbs sampling approach for phrase-based machine translation
models is developed which obviates the need of computing this term yet produces
samples from the required distribution.
We establish that the sampler effectively explores the distribution defined by a
phrase-based models by showing that it converges in a reasonable amount of time to
the desired distribution, irrespective of initialisation. Empirical evidence is provided to
confirm that the sampler can provide accurate estimates of expectations of functions of
interest. The mix of high probability and low probability derivations obtained through
sampling is shown to provide a more accurate estimate of expectations than merely
using the n-most highly probable derivations.
Subsequently, we show that the sampler provides a tractable solution for finding the
maximum probability translation in the model. We also present a unified approach to
approximating two additional intractable problems: minimum risk training and minimum
Bayes risk decoding. Key to our approach is the use of the sampler which
allows us to explore the entire probability distribution and maintain a strict probabilistic
formulation through the translation pipeline. For these tasks, sampling allies
the simplicity of n-best list approaches with the extended view of the distribution that
lattice-based approaches benefit from, while avoiding the biases associated with beam
search. Our approach is theoretically well-motivated and can give better and more
stable results than current state of the art methods
Recommended from our members
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as robot control, time series prediction, and bioinformatics. Recent advances in contextual word embeddings like BERT boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand.
At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This work can be understood as an antithesis to this paradigm. We show how traditional symbolic statistical machine translation models can still improve neural machine translation (NMT) while reducing the risk for common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural grammatical error correction. We also focus on language models that often do not play a role in vanilla end-to-end approaches and apply them in different ways to word reordering, grammatical error correction, low-resource NMT, and document-level NMT. Finally, we demonstrate the benefit of hierarchical models in sequence-to-sequence prediction. Hand-engineered covering grammars are effective in preventing catastrophic errors in neural text normalization systems. Our operation sequence model for interpretable NMT represents translation as a series of actions that modify the translation state, and can also be seen as derivation in a formal grammar.EPSRC grant EP/L027623/1
EPSRC Tier-2 capital grant EP/P020259/
Adaptation in Machine Translation
Statistical machine translation (SMT) has emerged as the currently most promising approach for machine translation. One limitation to date, however, is that the quality of SMT systems strongly depends on the similarity between the training data and its deployment. This thesis is devoted to adapting MT systems in the scenario of mismatching training data. We develop different approaches to increase performance even though all or some of the training data does not match the system\u27s application
- …