17,173 research outputs found
Graph Convolutional Encoders for Syntax-aware Neural Machine Translation
We present a simple and effective approach to incorporating syntactic
structure into neural attention-based encoder-decoder models for machine
translation. We rely on graph-convolutional networks (GCNs), a recent class of
neural networks developed for modeling graph-structured data. Our GCNs use
predicted syntactic dependency trees of source sentences to produce
representations of words (i.e. hidden states of the encoder) that are sensitive
to their syntactic neighborhoods. GCNs take word representations as input and
produce word representations as output, so they can easily be incorporated as
layers into standard encoders (e.g., on top of bidirectional RNNs or
convolutional neural networks). We evaluate their effectiveness with
English-German and English-Czech translation experiments for different types of
encoders and observe substantial improvements over their syntax-agnostic
versions in all the considered setups
Convolutional Sequence to Sequence Learning
The prevalent approach to sequence to sequence learning maps an input
sequence to a variable length output sequence via recurrent neural networks. We
introduce an architecture based entirely on convolutional neural networks.
Compared to recurrent models, computations over all elements can be fully
parallelized during training and optimization is easier since the number of
non-linearities is fixed and independent of the input length. Our use of gated
linear units eases gradient propagation and we equip each decoder layer with a
separate attention module. We outperform the accuracy of the deep LSTM setup of
Wu et al. (2016) on both WMT'14 English-German and WMT'14 English-French
translation at an order of magnitude faster speed, both on GPU and CPU
An Introductory Survey on Attention Mechanisms in NLP Problems
First derived from human intuition, later adapted to machine translation for
automatic token alignment, attention mechanism, a simple method that can be
used for encoding sequence data based on the importance score each element is
assigned, has been widely applied to and attained significant improvement in
various tasks in natural language processing, including sentiment
classification, text summarization, question answering, dependency parsing,
etc. In this paper, we survey through recent works and conduct an introductory
summary of the attention mechanism in different NLP problems, aiming to provide
our readers with basic knowledge on this widely used method, discuss its
different variants for different tasks, explore its association with other
techniques in machine learning, and examine methods for evaluating its
performance.Comment: 9 page
Deep Learning applied to NLP
Convolutional Neural Network (CNNs) are typically associated with Computer
Vision. CNNs are responsible for major breakthroughs in Image Classification
and are the core of most Computer Vision systems today. More recently CNNs have
been applied to problems in Natural Language Processing and gotten some
interesting results. In this paper, we will try to explain the basics of CNNs,
its different variations and how they have been applied to NLP
Neural-based machine translation for medical text domain. Based on European Medicines Agency leaflet texts
The quality of machine translation is rapidly evolving. Today one can find
several machine translation systems on the web that provide reasonable
translations, although the systems are not perfect. In some specific domains,
the quality may decrease. A recently proposed approach to this domain is neural
machine translation. It aims at building a jointly-tuned single neural network
that maximizes translation performance, a very different approach from
traditional statistical machine translation. Recently proposed neural machine
translation models often belong to the encoder-decoder family in which a source
sentence is encoded into a fixed length vector that is, in turn, decoded to
generate a translation. The present research examines the effects of different
training methods on a Polish-English Machine Translation system used for
medical data. The European Medicines Agency parallel text corpus was used as
the basis for training of neural and statistical network-based translation
systems. The main machine translation evaluation metrics have also been used in
analysis of the systems. A comparison and implementation of a real-time medical
translator is the main focus of our experiments.Comment: machine translation, statistical machine translation, neural machine
trasnlation, nlp, text processing, medical communicatio
What do you learn from context? Probing for sentence structure in contextualized word representations
Contextualized representation models such as ELMo (Peters et al., 2018a) and
BERT (Devlin et al., 2018) have recently achieved state-of-the-art results on a
diverse array of downstream NLP tasks. Building on recent token-level probing
work, we introduce a novel edge probing task design and construct a broad suite
of sub-sentence tasks derived from the traditional structured NLP pipeline. We
probe word-level contextual representations from four recent models and
investigate how they encode sentence structure across a range of syntactic,
semantic, local, and long-range phenomena. We find that existing models trained
on language modeling and translation produce strong representations for
syntactic phenomena, but only offer comparably small improvements on semantic
tasks over a non-contextual baseline.Comment: ICLR 2019 camera-ready version, 17 pages including appendice
Syntax-based Attention Model for Natural Language Inference
Introducing attentional mechanism in neural network is a powerful concept,
and has achieved impressive results in many natural language processing tasks.
However, most of the existing models impose attentional distribution on a flat
topology, namely the entire input representation sequence. Clearly, any
well-formed sentence has its accompanying syntactic tree structure, which is a
much rich topology. Applying attention to such topology not only exploits the
underlying syntax, but also makes attention more interpretable. In this paper,
we explore this direction in the context of natural language inference. The
results demonstrate its efficacy. We also perform extensive qualitative
analysis, deriving insights and intuitions of why and how our model works.Comment: Submitted to EMNLP 201
Neural machine translation for low-resource languages
Neural machine translation (NMT) approaches have improved the state of the
art in many machine translation settings over the last couple of years, but
they require large amounts of training data to produce sensible output. We
demonstrate that NMT can be used for low-resource languages as well, by
introducing more local dependencies and using word alignments to learn sentence
reordering during translation. In addition to our novel model, we also present
an empirical evaluation of low-resource phrase-based statistical machine
translation (SMT) and NMT to investigate the lower limits of the respective
technologies. We find that while SMT remains the best option for low-resource
settings, our method can produce acceptable translations with only 70000 tokens
of training data, a level where the baseline NMT system fails completely.Comment: rejected from EMNLP 201
Entity Candidate Network for Whole-Aware Named Entity Recognition
Named Entity Recognition (NER) is a crucial upstream task in Natural Language
Processing (NLP). Traditional tag scheme approaches offer a single recognition
that does not meet the needs of many downstream tasks such as coreference
resolution. Meanwhile, Tag scheme approaches ignore the continuity of entities.
Inspired by one-stage object detection models in computer vision (CV), this
paper proposes a new no-tag scheme, the Whole-Aware Detection, which makes NER
an object detection task. Meanwhile, this paper presents a novel model, Entity
Candidate Network (ECNet), and a specific convolution network, Adaptive Context
Convolution Network (ACCN), to fuse multi-scale contexts and encode entity
information at each position. ECNet identifies the full span of a named entity
and its type at each position based on Entity Loss. Furthermore, ECNet is
regulable between the highest precision and the highest recall, while the tag
scheme approaches are not. Experimental results on the CoNLL 2003 English
dataset and the WNUT 2017 dataset show that ECNet outperforms other previous
state-of-the-art methods.Comment: 10 pages, 4 figure
Analysis Methods in Neural Language Processing: A Survey
The field of natural language processing has seen impressive progress in
recent years, with neural network models replacing many of the traditional
systems. A plethora of new models have been proposed, many of which are thought
to be opaque compared to their feature-rich counterparts. This has led
researchers to analyze, interpret, and evaluate neural networks in novel and
more fine-grained ways. In this survey paper, we review analysis methods in
neural language processing, categorize them according to prominent research
trends, highlight existing limitations, and point to potential directions for
future work.Comment: Version including the supplementary materials (3 tables), also
available at https://boknilev.github.io/nlp-analysis-method
- …