Search CORE

24,397 research outputs found

An Improved Hierarchical Word Sequence Language Model Using Directional Information

Author: Matsumoto Yuji
Wu Xiaoyi
Publication venue
Publication date: 01/01/2015
Field of study

Learning Multi-Level Information for Dialogue Response Selection by Highway Recurrent Transformer

Author: Chiang Ting-Rui
Huang Chao-Wei
Su Shang-Yu
Chen Yun-Nung
Publication venue
Publication date: 15/01/1929
Field of study

With the increasing research interest in dialogue response generation, there is an emerging branch formulating this task as selecting next sentences, where given the partial dialogue contexts, the goal is to determine the most probable next sentence. Following the recent success of the Transformer model, this paper proposes (1) a new variant of attention mechanism based on multi-head attention, called highway attention, and (2) a recurrent model based on transformer and the proposed highway attention, so-called Highway Recurrent Transformer. Experiments on the response selection task in the seventh Dialog System Technology Challenge (DSTC7) show the capability of the proposed model of modeling both utterance-level and dialogue-level information; the effectiveness of each module is further analyzed as well

arXiv.org e-Print Archive

Trinity College

When Are Tree Structures Necessary for Deep Learning of Representations?

Author: Hovy Eudard
Jurafsky Dan
Li Jiwei
Luong Minh-Thang
Publication venue
Publication date: 01/01/2015
Field of study

Recursive neural models, which use syntactic parse trees to recursively generate representations bottom-up, are a popular architecture. But there have not been rigorous evaluations showing for exactly which tasks this syntax-based method is appropriate. In this paper we benchmark {\bf recursive} neural models against sequential {\bf recurrent} neural models (simple recurrent and LSTM models), enforcing apples-to-apples comparison as much as possible. We investigate 4 tasks: (1) sentiment classification at the sentence level and phrase level; (2) matching questions to answer-phrases; (3) discourse parsing; (4) semantic relation extraction (e.g., {\em component-whole} between nouns). Our goal is to understand better when, and why, recursive models can outperform simpler models. We find that recursive models help mainly on tasks (like semantic relation extraction) that require associating headwords across a long distance, particularly on very long sequences. We then introduce a method for allowing recurrent models to achieve similar performance: breaking long sentences into clause-like units at punctuation and processing them separately before combining. Our results thus help understand the limitations of both classes of models, and suggest directions for improving recurrent models

arXiv.org e-Print Archive

CiteSeerX

Crossref