24,397 research outputs found
Learning Multi-Level Information for Dialogue Response Selection by Highway Recurrent Transformer
With the increasing research interest in dialogue response generation, there
is an emerging branch formulating this task as selecting next sentences, where
given the partial dialogue contexts, the goal is to determine the most probable
next sentence. Following the recent success of the Transformer model, this
paper proposes (1) a new variant of attention mechanism based on multi-head
attention, called highway attention, and (2) a recurrent model based on
transformer and the proposed highway attention, so-called Highway Recurrent
Transformer. Experiments on the response selection task in the seventh Dialog
System Technology Challenge (DSTC7) show the capability of the proposed model
of modeling both utterance-level and dialogue-level information; the
effectiveness of each module is further analyzed as well
When Are Tree Structures Necessary for Deep Learning of Representations?
Recursive neural models, which use syntactic parse trees to recursively
generate representations bottom-up, are a popular architecture. But there have
not been rigorous evaluations showing for exactly which tasks this syntax-based
method is appropriate. In this paper we benchmark {\bf recursive} neural models
against sequential {\bf recurrent} neural models (simple recurrent and LSTM
models), enforcing apples-to-apples comparison as much as possible. We
investigate 4 tasks: (1) sentiment classification at the sentence level and
phrase level; (2) matching questions to answer-phrases; (3) discourse parsing;
(4) semantic relation extraction (e.g., {\em component-whole} between nouns).
Our goal is to understand better when, and why, recursive models can
outperform simpler models. We find that recursive models help mainly on tasks
(like semantic relation extraction) that require associating headwords across a
long distance, particularly on very long sequences. We then introduce a method
for allowing recurrent models to achieve similar performance: breaking long
sentences into clause-like units at punctuation and processing them separately
before combining. Our results thus help understand the limitations of both
classes of models, and suggest directions for improving recurrent models
- …