881 research outputs found

    Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution

    Get PDF
    Unsupervised Machine Translation hasbeen advancing our ability to translatewithout parallel data, but state-of-the-artmethods assume an abundance of mono-lingual data. This paper investigates thescenario where monolingual data is lim-ited as well, finding that current unsuper-vised methods suffer in performance un-der this stricter setting. We find that theperformance loss originates from the poorquality of the pretrained monolingual em-beddings, and we propose using linguis-tic information in the embedding train-ing scheme. To support this, we look attwo linguistic features that may help im-prove alignment quality: dependency in-formation and sub-word information. Us-ing dependency-based embeddings resultsin a complementary word representationwhich offers a boost in performance ofaround 1.5 BLEU points compared to stan-dardWORD2VECwhen monolingual datais limited to 1 million sentences per lan-guage. We also find that the inclusion ofsub-word information is crucial to improv-ing the quality of the embedding

    Statistical model of human lexical category disambiguation

    Get PDF
    Research in Sentence Processing is concerned with discovering the mechanism by which linguistic utterances are mapped onto meaningful representations within the human mind. Models of the Human Sentence Processing Mechanism (HSPM) can be divided into those in which such mapping is performed by a number of limited modular processes and those in which there is a single interactive process. A further, and increasingly important, distinction is between models which rely on innate preferences to guide decision processes and those which make use of experiencebased statistics. In this context, the aims of the current thesis are two-fold: • To argue that the correct architecture of the HSPM is both modular and statistical - the Modular Statistical Hypothesis (MSH). • To propose and provide empirical support for a position in which human lexical category disambiguation occurs within a modular process, distinct from syntactic parsing and guided by a statistical decision process. Arguments are given for why a modular statistical architecture should be preferred on both methodological and rational grounds. We then turn to the (often ignored) problem of lexical category disambiguation and propose the existence of a presyntactic Statistical Lexical Category Module (SLCM). A number of variants of the SLCM are introduced. By empirically investigating this particular architecture we also hope to provide support for the more general hypothesis - the MSH. The SLCM has some interesting behavioural properties; the remainder of the thesis empirically investigates whether these behaviours are observable in human sentence processing. We first consider whether the results of existing studies might be attributable to SLCM behaviour. Such evaluation provides support for an HSPM architecture that includes this SLCM and allows us to determine which SLCM variant is empirically most plausible. Predictions are made, using this variant, to determine SLCM behaviour in the face of novel utterances; these predictions are then tested using a self-paced reading paradigm. The results of this experimentation fully support the inclusion of the SLCM in a model of the HSPM and are not compatible with other existing models. As the SLCM is a modular and statistical process, empirical evidence for the SLCM also directly supports an HSPM architecture which is modular and statistical. We therefore conclude that our results strongly support both the SLCM and the MSH. However, more work is needed, both to produce further evidence and to define the model further

    Tackling Sequence to Sequence Mapping Problems with Neural Networks

    Full text link
    In Natural Language Processing (NLP), it is important to detect the relationship between two sequences or to generate a sequence of tokens given another observed sequence. We call the type of problems on modelling sequence pairs as sequence to sequence (seq2seq) mapping problems. A lot of research has been devoted to finding ways of tackling these problems, with traditional approaches relying on a combination of hand-crafted features, alignment models, segmentation heuristics, and external linguistic resources. Although great progress has been made, these traditional approaches suffer from various drawbacks, such as complicated pipeline, laborious feature engineering, and the difficulty for domain adaptation. Recently, neural networks emerged as a promising solution to many problems in NLP, speech recognition, and computer vision. Neural models are powerful because they can be trained end to end, generalise well to unseen examples, and the same framework can be easily adapted to a new domain. The aim of this thesis is to advance the state-of-the-art in seq2seq mapping problems with neural networks. We explore solutions from three major aspects: investigating neural models for representing sequences, modelling interactions between sequences, and using unpaired data to boost the performance of neural models. For each aspect, we propose novel models and evaluate their efficacy on various tasks of seq2seq mapping.Comment: PhD thesi
    • …
    corecore