2,194 research outputs found

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    Towards Incremental Parsing of Natural Language using Recursive Neural Networks

    Get PDF
    In this paper we develop novel algorithmic ideas for building a natural language parser grounded upon the hypothesis of incrementality. Although widely accepted and experimentally supported under a cognitive perspective as a model of the human parser, the incrementality assumption has never been exploited for building automatic parsers of unconstrained real texts. The essentials of the hypothesis are that words are processed in a left-to-right fashion, and the syntactic structure is kept totally connected at each step. Our proposal relies on a machine learning technique for predicting the correctness of partial syntactic structures that are built during the parsing process. A recursive neural network architecture is employed for computing predictions after a training phase on examples drawn from a corpus of parsed sentences, the Penn Treebank. Our results indicate the viability of the approach andlay out the premises for a novel generation of algorithms for natural language processing which more closely model human parsing. These algorithms may prove very useful in the development of eÆcient parsers

    Transformer-based NMT : modeling, training and implementation

    Get PDF
    International trade and industrial collaborations enable countries and regions to concentrate their developments on specific industries while making the most of other countries' specializations, which significantly accelerates global development. However, globalization also increases the demand for cross-region communication. Language barriers between many languages worldwide create a challenge for achieving deep collaboration between groups speaking different languages, increasing the need for translation. Language technology, specifically, Machine Translation (MT) holds the promise to enable communication between languages efficiently in real-time with minimal costs. Even though nowadays computers can perform computation in parallel very fast, which provides machine translation users with translations with very low latency, and although the evolution from Statistical Machine Translation (SMT) to Neural Machine Translation (NMT) with the utilization of advanced deep learning algorithms has significantly boosted translation quality, current machine translation algorithms are still far from accurately translating all input. Thus, how to further improve the performance of state-of-the-art NMT algorithm remains a valuable open research question which has received a wide range of attention. In the research presented in this thesis, we first investigate the long-distance relation modeling ability of the state-of-the-art NMT model, the Transformer. We propose to learn source phrase representations and incorporate them into the Transformer translation model, aiming to enhance its ability to capture long-distance dependencies well. Second, though previous work (Bapna et al., 2018) suggests that deep Transformers have difficulty in converging, we empirically find that the convergence of deep Transformers depends on the interaction between the layer normalization and residual connections employed to stabilize its training. We conduct a theoretical study about how to ensure the convergence of Transformers, especially for deep Transformers, and propose to ensure the convergence of deep Transformers by putting the Lipschitz constraint on its parameter initialization. Finally, we investigate how to dynamically determine proper and efficient batch sizes during the training of the Transformer model. We find that the gradient direction gets stabilized with increasing batch size during gradient accumulation. Thus we propose to dynamically adjust batch sizes during training by monitoring the gradient direction change within gradient accumulation, and to achieve a proper and efficient batch size by stopping the gradient accumulation when the gradient direction starts to fluctuate. For our research in this thesis, we also implement our own NMT toolkit, the Neutron implementation of the Transformer and its variants. In addition to providing fundamental features as the basis of our implementations for the approaches presented in this thesis, we support many advanced features from recent cutting-edge research work. Implementations of all our approaches in this thesis are also included and open-sourced in the toolkit. To compare with previous approaches, we mainly conducted our experiments on the data from the WMT 14 English to German (En-De) and English to French (En-Fr) news translation tasks, except when studying the convergence of deep Transformers, where we alternated the WMT 14 En-Fr task with the WMT 15 Czech to English (Cs-En) news translation task to compare with Bapna et al. (2018). The sizes of these datasets vary from medium (the WMT 14 En-De, ~ 4.5M sentence pairs) to very large (the WMT 14 En-Fr, ~ 36M sentence pairs), thus we suggest our approaches help improve the translation quality between popular language pairs which are widely used and have sufficient data.China Scholarship Counci

    A Simple and Accurate Syntax-Agnostic Neural Model for Dependency-based Semantic Role Labeling

    Get PDF
    We introduce a simple and accurate neural model for dependency-based semantic role labeling. Our model predicts predicate-argument dependencies relying on states of a bidirectional LSTM encoder. The semantic role labeler achieves competitive performance on English, even without any kind of syntactic information and only using local inference. However, when automatically predicted part-of-speech tags are provided as input, it substantially outperforms all previous local models and approaches the best reported results on the English CoNLL-2009 dataset. We also consider Chinese, Czech and Spanish where our approach also achieves competitive results. Syntactic parsers are unreliable on out-of-domain data, so standard (i.e., syntactically-informed) SRL models are hindered when tested in this setting. Our syntax-agnostic model appears more robust, resulting in the best reported results on standard out-of-domain test sets.Comment: To appear in CoNLL 201

    Machine learning to generate soil information

    Get PDF
    This thesis is concerned with the novel use of machine learning (ML) methods in soil science research. ML adoption in soil science has increased considerably, especially in pedometrics (the use of quantitative methods to study the variation of soils). In parallel, the size of the soil datasets has also increased thanks to projects of global impact that aim to rescue legacy data or new large extent surveys to collect new information. While we have big datasets and global projects, currently, modelling is mostly based on "traditional" ML approaches which do not take full advantage of these large data compilations. This compilation of these global datasets is severely limited by privacy concerns and, currently, no solution has been implemented to facilitate the process. If we consider the performance differences derived from the generality of global models versus the specificity of local models, there is still a debate on which approach is better. Either in global or local DSM, most applications are static. Even with the large soil datasets available to date, there is not enough soil data to perform a fully-empirical, space-time modelling. Considering these knowledge gaps, this thesis aims to introduce advanced ML algorithms and training techniques, specifically deep neural networks, for modelling large datasets at a global scale and provide new soil information. The research presented here has been successful at applying the latest advances in ML to improve upon some of the current approaches for soil modelling with large datasets. It has also created opportunities to utilise information, such as descriptive data, that has been generally disregarded. ML methods have been embraced by the soil community and their adoption is increasing. In the particular case of neural networks, their flexibility in terms of structure and training makes them a good candidate to improve on current soil modelling approaches

    The impact of pretrained language models on negation and speculation detection in cross-lingual medical text: Comparative study

    Get PDF
    Background: Negation and speculation are critical elements in natural language processing (NLP)-related tasks, such as information extraction, as these phenomena change the truth value of a proposition. In the clinical narrative that is informal, these linguistic facts are used extensively with the objective of indicating hypotheses, impressions, or negative findings. Previous state-of-the-art approaches addressed negation and speculation detection tasks using rule-based methods, but in the last few years, models based on machine learning and deep learning exploiting morphological, syntactic, and semantic features represented as spare and dense vectors have emerged. However, although such methods of named entity recognition (NER) employ a broad set of features, they are limited to existing pretrained models for a specific domain or language. Objective: As a fundamental subsystem of any information extraction pipeline, a system for cross-lingual and domain-independent negation and speculation detection was introduced with special focus on the biomedical scientific literature and clinical narrative. In this work, detection of negation and speculation was considered as a sequence-labeling task where cues and the scopes of both phenomena are recognized as a sequence of nested labels recognized in a single step. Methods: We proposed the following two approaches for negation and speculation detection: (1) bidirectional long short-term memory (Bi-LSTM) and conditional random field using character, word, and sense embeddings to deal with the extraction of semantic, syntactic, and contextual patterns and (2) bidirectional encoder representations for transformers (BERT) with fine tuning for NER. Results: The approach was evaluated for English and Spanish languages on biomedical and review text, particularly with the BioScope corpus, IULA corpus, and SFU Spanish Review corpus, with F-measures of 86.6%, 85.0%, and 88.1%, respectively, for NeuroNER and 86.4%, 80.8%, and 91.7%, respectively, for BERT. Conclusions: These results show that these architectures perform considerably better than the previous rule-based and conventional machine learning-based systems. Moreover, our analysis results show that pretrained word embedding and particularly contextualized embedding for biomedical corpora help to understand complexities inherent to biomedical text.This work was supported by the Research Program of the Ministry of Economy and Competitiveness, Government of Spain (DeepEMR Project TIN2017-87548-C2-1-R)

    Moving beyond parallel data for neural machine translation

    Get PDF
    The goal of neural machine translation (NMT) is to build an end-to-end system that automatically translates sentences from the source language to the target language. Neural machine translation has become the dominant paradigm in machine translation in recent years, showing strong improvements over prior statistical methods in many scenarios. However, neural machine translation relies heavily on parallel corpora for training; even for two languages with abundant monolingual resources (or with a large number of speakers), such parallel corpora may be scarce. Thus, it is important to develop methods for leveraging additional types of data in NMT training. This thesis explores ways of augmenting the parallel training data of neural machine translation with non-parallel sources of data. We concentrate on two main types of additional data: monolingual corpora and structural annotations. First, we propose a method for adding target-language monolingual data into neural machine translation in which the monolingual data is converted to parallel data through copying. Thus, the NMT system is trained on two tasks: translation from source language to target language, and autoencoding the target language. We show that this model achieves improvements in BLEU score for low- and medium-resource setups. Second, we consider the task of zero-resource NMT, where no source ↔ target parallel training data is available, but parallel data with a pivot language is abundant. We improve these models by adding a monolingual corpus in the pivot language, translating this corpus into both the source and the target language to create a pseudo-parallel source-target corpus. In the second half of this thesis, we turn our attention to syntax, introducing methods for adding syntactic annotation of the source language into neural machine translation. In particular, our multi-source model, which leverages an additional encoder to inject syntax into the NMT model, results in strong improvements over non-syntactic NMT for a high-resource translation case, while remaining robust to unparsed inputs. We also introduce a multi-task model that augments the transformer architecture with syntax; this model improves translation across several language pairs. Finally, we consider the case where no syntactic annotations are available (such as when translating from very low-resource languages). We introduce an unsupervised hierarchical encoder that induces a tree structure over the source sentences based solely on the downstream task of translation. Although the resulting hierarchies do not resemble traditional syntax, the model shows large improvements in BLEU for low-resource NMT
    • …
    corecore