881 research outputs found
Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution
Unsupervised Machine Translation hasbeen advancing our ability to translatewithout parallel data, but state-of-the-artmethods assume an abundance of mono-lingual data. This paper investigates thescenario where monolingual data is lim-ited as well, finding that current unsuper-vised methods suffer in performance un-der this stricter setting. We find that theperformance loss originates from the poorquality of the pretrained monolingual em-beddings, and we propose using linguis-tic information in the embedding train-ing scheme. To support this, we look attwo linguistic features that may help im-prove alignment quality: dependency in-formation and sub-word information. Us-ing dependency-based embeddings resultsin a complementary word representationwhich offers a boost in performance ofaround 1.5 BLEU points compared to stan-dardWORD2VECwhen monolingual datais limited to 1 million sentences per lan-guage. We also find that the inclusion ofsub-word information is crucial to improv-ing the quality of the embedding
Statistical model of human lexical category disambiguation
Research in Sentence Processing is concerned with discovering the mechanism by
which linguistic utterances are mapped onto meaningful representations within the
human mind. Models of the Human Sentence Processing Mechanism (HSPM) can
be divided into those in which such mapping is performed by a number of limited
modular processes and those in which there is a single interactive process. A further,
and increasingly important, distinction is between models which rely on innate
preferences to guide decision processes and those which make use of experiencebased
statistics.
In this context, the aims of the current thesis are two-fold:
• To argue that the correct architecture of the HSPM is both modular and
statistical - the Modular Statistical Hypothesis (MSH).
• To propose and provide empirical support for a position in which human
lexical category disambiguation occurs within a modular process, distinct
from syntactic parsing and guided by a statistical decision process.
Arguments are given for why a modular statistical architecture should be preferred
on both methodological and rational grounds. We then turn to the (often ignored)
problem of lexical category disambiguation and propose the existence of a presyntactic
Statistical Lexical Category Module (SLCM). A number of variants of the
SLCM are introduced. By empirically investigating this particular architecture we
also hope to provide support for the more general hypothesis - the MSH.
The SLCM has some interesting behavioural properties; the remainder of the thesis
empirically investigates whether these behaviours are observable in human sentence
processing. We first consider whether the results of existing studies might be
attributable to SLCM behaviour. Such evaluation provides support for an HSPM
architecture that includes this SLCM and allows us to determine which SLCM
variant is empirically most plausible. Predictions are made, using this variant, to
determine SLCM behaviour in the face of novel utterances; these predictions are then
tested using a self-paced reading paradigm. The results of this experimentation fully
support the inclusion of the SLCM in a model of the HSPM and are not compatible
with other existing models.
As the SLCM is a modular and statistical process, empirical evidence for the SLCM
also directly supports an HSPM architecture which is modular and statistical. We
therefore conclude that our results strongly support both the SLCM and the MSH.
However, more work is needed, both to produce further evidence and to define the
model further
Tackling Sequence to Sequence Mapping Problems with Neural Networks
In Natural Language Processing (NLP), it is important to detect the
relationship between two sequences or to generate a sequence of tokens given
another observed sequence. We call the type of problems on modelling sequence
pairs as sequence to sequence (seq2seq) mapping problems. A lot of research has
been devoted to finding ways of tackling these problems, with traditional
approaches relying on a combination of hand-crafted features, alignment models,
segmentation heuristics, and external linguistic resources. Although great
progress has been made, these traditional approaches suffer from various
drawbacks, such as complicated pipeline, laborious feature engineering, and the
difficulty for domain adaptation. Recently, neural networks emerged as a
promising solution to many problems in NLP, speech recognition, and computer
vision. Neural models are powerful because they can be trained end to end,
generalise well to unseen examples, and the same framework can be easily
adapted to a new domain.
The aim of this thesis is to advance the state-of-the-art in seq2seq mapping
problems with neural networks. We explore solutions from three major aspects:
investigating neural models for representing sequences, modelling interactions
between sequences, and using unpaired data to boost the performance of neural
models. For each aspect, we propose novel models and evaluate their efficacy on
various tasks of seq2seq mapping.Comment: PhD thesi
- …