Search CORE

16 research outputs found

Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards

Author: Choi Sungja
Cuayahuitl Heriberto
Hwang Inchul
Kim Jihie
Lee Donghyeon
Ryu Seonghan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/07/2019
Field of study

Training chatbots using the reinforcement learning paradigm is challenging due to high-dimensional states, infinite action spaces and the difficulty in specifying the reward function. We address such problems using clustered actions instead of infinite actions, and a simple but promising reward function based on human-likeness scores derived from human-human dialogue data. We train Deep Reinforcement Learning (DRL) agents using chitchat data in raw text—without any manual annotations. Experimental results using different splits of training data report the following. First, that our agents learn reasonable policies in the environments they get familiarised with, but their performance drops substantially when they are exposed to a test set of unseen dialogues. Second, that the choice of sentence embedding size between 100 and 300 dimensions is not significantly different on test data. Third, that our proposed human-likeness rewards are reasonable for training chatbots as long as they use lengthy dialogue histories of ≥10 sentences

University of Lincoln Institutional Repository

arXiv.org e-Print Archive

Crossref

A Study on Dialogue Reward Prediction for Open-Ended Conversational Agents

Author: Cuayahuitl Heriberto
Kim Jihie
Lee Donghyeon
Ryu Seonghan
Publication venue: 'Center for Open Science'
Publication date: 02/12/2018
Field of study

The amount of dialogue history to include in a conversational agent is often underestimated and/or set in an empirical and thus possibly naive way. This suggests that principled investigations into optimal context windows are urgently needed given that the amount of dialogue history and corresponding representations can play an important role in the overall performance of a conversational system. This paper studies the amount of history required by conversational agents for reliably predicting dialogue rewards. The task of dialogue reward prediction is chosen for investigating the effects of varying amounts of dialogue history and their impact on system performance. Experimental results using a dataset of 18K human-human dialogues report that lengthy dialogue histories of at least 10 sentences are preferred (25 sentences being the best in our experiments) over short ones, and that lengthy histories are useful for training dialogue reward predictors with strong positive correlations between target dialogue rewards and predicted ones

University of Lincoln Institutional Repository

arXiv.org e-Print Archive

Ensemble-Based Deep Reinforcement Learning for Chatbots

Author: Cho Yongjin
Choi Hyungtak
Choi Sungja
Cuayahuitl Heriberto
Hwang Inchul
Indurthi Satish
Kim Jihie
Lee Donghyeon
Ryu Seonghan
Yu Seunghak
Publication venue: 'Elsevier BV'
Publication date: 27/08/2019
Field of study

Trainable chatbots that exhibit fluent and human-like conversations remain a big challenge in artificial intelligence. Deep Reinforcement Learning (DRL) is promising for addressing this challenge, but its successful application remains an open question. This article describes a novel ensemble-based approach applied to value-based DRL chatbots, which use finite action sets as a form of meaning representation. In our approach, while dialogue actions are derived from sentence clustering, the training datasets in our ensemble are derived from dialogue clustering. The latter aim to induce specialised agents that learn to interact in a particular style. In order to facilitate neural chatbot training using our proposed approach, we assume dialogue data in raw text only – without any manually-labelled data. Experimental results using chitchat data reveal that (1) near human-like dialogue policies can be induced, (2) generalisation to unseen data is a difficult problem, and (3) training an ensemble of chatbot agents is essential for improved performance over using a single agent. In addition to evaluations using held-out data, our results are further supported by a human evaluation that rated dialogues in terms of fluency, engagingness and consistency – which revealed that our proposed dialogue rewards strongly correlate with human judgements

University of Lincoln Institutional Repository

arXiv.org e-Print Archive

대화모델링을 위한 음성언어 이해 기술

Author: Seonghan Ryu
이근배
Publication venue: 대한전자공학회
Publication date: 01/03/2014
Field of study

국내2

포항공과대학교

Neural sentence embedding using only in-domain sentences for out-of-domain sentence detection in dialog systems

Author: Hwanjo Yu
Junhwi Choi
Lee G.G.
Seokhwan Kim
Seonghan Ryu
Publication venue: 'Elsevier BV'
Publication date: 27/07/2018
Field of study

arXiv.org e-Print Archive

포항공과대학교

Vowel reduction feedback system for non-native learners of English

Author: Jeesoo Bang
Kyusong Lee
Lee G.G.
Seonghan Ryu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

None111scopu

포항공과대학교

One-Step Error Detection and Correction Approach for Voice Word Processor

Author: Gary Geunbae LEE
Junhwi CHOI
Kyusong LEE
Seonghan RYU
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/01/2015
Field of study

Crossref

Out-of-domain Detection based on Generative Adversarial Network

Author: HWANJO YU
KOO SANG JUN
LEE GARY GEUNBAE
Seonghan Ryu
Publication venue: EMNLP
Publication date: 04/11/2018
Field of study

The main goal of this paper is to develop out-of-domain (OOD) detection for dialog systems. We propose to use only in-domain (IND) sentences to build a generative adversarial network (GAN) of which the discriminator generates low scores for OOD sentences. To improve basic GANs, we apply feature matching loss in the discriminator, use domain-category analysis as an additional task in the discriminator, and remove the biases in the generator. Thereby, we reduce the huge effort of collecting OOD sentences for training OOD detection. For evaluation, we experimented OOD detection on a multi-domain dialog system. The experimental results showed the proposed method was most accurate compared to the existing methods. © 2018 Association for Computational Linguistics1

포항공과대학교

One-step error detection and correction approach for voice word processor.

Author: Junhwi CHOI Seonghan RYU, Kyusong LEE, Gary GEUNBAE LEE
이근배
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/08/2015
Field of study

ope

포항공과대학교

DESING AND ANALYSIS OF LINEAR CHANNEL SELECTION FILTER FOR DIRECT CONVERSION RECEVIER.

Author: Bumman Kim
Huijung Kim
Jong-Ryul Lee
Sangsu Jin
Seonghan Ryu
Publication venue: 'Baishideng Publishing Group Inc.'
Publication date: 01/01/2004
Field of study

포항공과대학교