27,174 research outputs found
The Cascade Transformer: an Application for Efficient Answer Sentence Selection
Large transformer-based language models have been shown to be very effective
in many classification tasks. However, their computational complexity prevents
their use in applications requiring the classification of a large set of
candidates. While previous works have investigated approaches to reduce model
size, relatively little attention has been paid to techniques to improve batch
throughput during inference. In this paper, we introduce the Cascade
Transformer, a simple yet effective technique to adapt transformer-based models
into a cascade of rankers. Each ranker is used to prune a subset of candidates
in a batch, thus dramatically increasing throughput at inference time. Partial
encodings from the transformer model are shared among rerankers, providing
further speed-up. When compared to a state-of-the-art transformer model, our
approach reduces computation by 37% with almost no impact on accuracy, as
measured on two English Question Answering datasets.Comment: Accepted to ACL 2020 (long
Neural Machine Translation via Binary Code Prediction
In this paper, we propose a new method for calculating the output layer in
neural machine translation systems. The method is based on predicting a binary
code for each word and can reduce computation time/memory requirements of the
output layer to be logarithmic in vocabulary size in the best case. In
addition, we also introduce two advanced approaches to improve the robustness
of the proposed model: using error-correcting codes and combining softmax and
binary codes. Experiments on two English-Japanese bidirectional translation
tasks show proposed models achieve BLEU scores that approach the softmax, while
reducing memory usage to the order of less than 1/10 and improving decoding
speed on CPUs by x5 to x10.Comment: Accepted as a long paper at ACL201
Cross-Sentence N-ary Relation Extraction with Graph LSTMs
Past work in relation extraction has focused on binary relations in single
sentences. Recent NLP inroads in high-value domains have sparked interest in
the more general setting of extracting n-ary relations that span multiple
sentences. In this paper, we explore a general relation extraction framework
based on graph long short-term memory networks (graph LSTMs) that can be easily
extended to cross-sentence n-ary relation extraction. The graph formulation
provides a unified way of exploring different LSTM approaches and incorporating
various intra-sentential and inter-sentential dependencies, such as sequential,
syntactic, and discourse relations. A robust contextual representation is
learned for the entities, which serves as input to the relation classifier.
This simplifies handling of relations with arbitrary arity, and enables
multi-task learning with related relations. We evaluate this framework in two
important precision medicine settings, demonstrating its effectiveness with
both conventional supervised learning and distant supervision. Cross-sentence
extraction produced larger knowledge bases. and multi-task learning
significantly improved extraction accuracy. A thorough analysis of various LSTM
approaches yielded useful insight the impact of linguistic analysis on
extraction accuracy.Comment: Conditional accepted by TACL in December 2016; published in April
2017; presented at ACL in August 201
Graph Convolutional Encoders for Syntax-aware Neural Machine Translation
We present a simple and effective approach to incorporating syntactic
structure into neural attention-based encoder-decoder models for machine
translation. We rely on graph-convolutional networks (GCNs), a recent class of
neural networks developed for modeling graph-structured data. Our GCNs use
predicted syntactic dependency trees of source sentences to produce
representations of words (i.e. hidden states of the encoder) that are sensitive
to their syntactic neighborhoods. GCNs take word representations as input and
produce word representations as output, so they can easily be incorporated as
layers into standard encoders (e.g., on top of bidirectional RNNs or
convolutional neural networks). We evaluate their effectiveness with
English-German and English-Czech translation experiments for different types of
encoders and observe substantial improvements over their syntax-agnostic
versions in all the considered setups
Concurrent Lexicalized Dependency Parsing: The ParseTalk Model
A grammar model for concurrent, object-oriented natural language parsing is
introduced. Complete lexical distribution of grammatical knowledge is achieved
building upon the head-oriented notions of valency and dependency, while
inheritance mechanisms are used to capture lexical generalizations. The
underlying concurrent computation model relies upon the actor paradigm. We
consider message passing protocols for establishing dependency relations and
ambiguity handling.Comment: 90kB, 7pages Postscrip
Memoization of Top Down Parsing
This paper discusses the relationship between memoized top-down recognizers
and chart parsers. It presents a version of memoization suitable for
continuation-passing style programs. When applied to a simple formalization of
a top-down recognizer it yields a terminating parser.Comment: uuencoded, compressed postscript fil
Question-Driven Summarization of Answers to Consumer Health Questions
Automatic summarization of natural language is a widely studied area in
computer science, one that is broadly applicable to anyone who routinely needs
to understand large quantities of information. For example, in the medical
domain, recent developments in deep learning approaches to automatic
summarization have the potential to make health information more easily
accessible to patients and consumers. However, to evaluate the quality of
automatically generated summaries of health information, gold-standard, human
generated summaries are required. Using answers provided by the National
Library of Medicine's consumer health question answering system, we present the
MEDIQA Answer Summarization dataset, the first summarization collection
containing question-driven summaries of answers to consumer health questions.
This dataset can be used to evaluate single or multi-document summaries
generated by algorithms using extractive or abstractive approaches. In order to
benchmark the dataset, we include results of baseline and state-of-the-art deep
learning summarization models, demonstrating that this dataset can be used to
effectively evaluate question-driven machine-generated summaries and promote
further machine learning research in medical question answering
Improving Recurrent Neural Networks For Sequence Labelling
In this paper we study different types of Recurrent Neural Networks (RNN) for
sequence labeling tasks. We propose two new variants of RNNs integrating
improvements for sequence labeling, and we compare them to the more traditional
Elman and Jordan RNNs. We compare all models, either traditional or new, on
four distinct tasks of sequence labeling: two on Spoken Language Understanding
(ATIS and MEDIA); and two of POS tagging for the French Treebank (FTB) and the
Penn Treebank (PTB) corpora. The results show that our new variants of RNNs are
always more effective than the others.Comment: 21 pages, 4 figure
Machine Translation : From Statistical to modern Deep-learning practices
Machine translation (MT) is an area of study in Natural Language processing
which deals with the automatic translation of human language, from one language
to another by the computer. Having a rich research history spanning nearly
three decades, Machine translation is one of the most sought after area of
research in the linguistics and computational community. In this paper, we
investigate the models based on deep learning that have achieved substantial
progress in recent years and becoming the prominent method in MT. We shall
discuss the two main deep-learning based Machine Translation methods, one at
component or domain level which leverages deep learning models to enhance the
efficacy of Statistical Machine Translation (SMT) and end-to-end deep learning
models in MT which uses neural networks to find correspondence between the
source and target languages using the encoder-decoder architecture. We conclude
this paper by providing a time line of the major research problems solved by
the researchers and also provide a comprehensive overview of present areas of
research in Neural Machine Translation
Sequence to Sequence Learning for Query Expansion
Using sequence to sequence algorithms for query expansion has not been
explored yet in Information Retrieval literature nor in Question-Answering's.
We tried to fill this gap in the literature with a custom Query Expansion
engine trained and tested on open datasets. Starting from open datasets, we
built a Query Expansion training set using sentence-embeddings-based Keyword
Extraction. We therefore assessed the ability of the Sequence to Sequence
neural networks to capture expanding relations in the words embeddings' space.Comment: 8 pages, 2 figures, AAAI-19 Student Abstract and Poster Progra
- …