Search CORE

36 research outputs found

Deep reinforcement learning for conversational robots playing games

Author: Cuayahuitl Heriberto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2017
Field of study

Deep reinforcement learning for interactive multimodal robots is attractive for endowing machines with trainable skill acquisition. But this form of learning still represents several challenges. The challenge that we focus in this paper is effective policy learning. To address that, in this paper we compare the Deep Q-Networks (DQN) method against a variant that aims for stronger decisions than the original method by avoiding decisions with the lowest negative rewards. We evaluated our baseline and proposed algorithms in agents playing the game of Noughts and Crosses with two grid sizes (3x3 and 5x5). Experimental results show evidence that our proposed method can lead to more effective policies than the baseline DQN method, which can be used for training interactive social robots

University of Lincoln Institutional Repository

A Data-Efficient Deep Learning Approach for Deployable Multimodal Social Robots

Author: Cuayahuitl Heriberto
Publication venue: 'Elsevier BV'
Publication date: 05/07/2019
Field of study

The deep supervised and reinforcement learning paradigms (among others) have the potential to endow interactive multimodal social robots with the ability of acquiring skills autonomously. But it is still not very clear yet how they can be best deployed in real world applications. As a step in this direction, we propose a deep learning-based approach for efficiently training a humanoid robot to play multimodal games---and use the game of `Noughts \& Crosses' with two variants as a case study. Its minimum requirements for learning to perceive and interact are based on a few hundred example images, a few example multimodal dialogues and physical demonstrations of robot manipulation, and automatic simulations. In addition, we propose novel algorithms for robust visual game tracking and for competitive policy learning with high winning rates, which substantially outperform DQN-based baselines. While an automatic evaluation shows evidence that the proposed approach can be easily extended to new games with competitive robot behaviours, a human evaluation with 130 humans playing with the {\it Pepper} robot confirms that highly accurate visual perception is required for successful game play

University of Lincoln Institutional Repository

arXiv.org e-Print Archive

Deep reinforcement learning of dialogue policies with less weight updates

Author: Cuayahuitl Heriberto
Yu Seunghak
Publication venue
Publication date: 20/08/2017
Field of study

Deep reinforcement learning dialogue systems are attractive because they can jointly learn their feature representations and policies without manual feature engineering. But its application is challenging due to slow learning. We propose a two-stage method for accelerating the induction of single or multi-domain dialogue policies. While the first stage reduces the amount of weight updates over time, the second stage uses very limited minibatches (of as much as two learning experiences) sampled from experience replay memories. The former frequently updates the weights of the neural nets at early stages of training, and decreases the amount of updates as training progresses by performing updates during exploration and by skipping updates during exploitation. The learning process is thus accelerated through less weight updates in both stages. An empirical evaluation in three domains (restaurants, hotels and tv guide) confirms that the proposed method trains policies 5 times faster than a baseline without the proposed method. Our findings are useful for training larger-scale neural-based spoken dialogue systems

University of Lincoln Institutional Repository

Crossref

Training an interactive humanoid robot using multimodal deep reinforcement learning

Author: Couly Guillaume
Cuayahuitl Heriberto
Olalainty Clement
Publication venue: 'Center for Open Science'
Publication date: 26/11/2016
Field of study

Training robots to perceive, act and communicate using multiple modalities still represents a challenging problem, particularly if robots are expected to learn efficiently from small sets of example interactions. We describe a learning approach as a step in this direction, where we teach a humanoid robot how to play the game of noughts and crosses. Given that multiple multimodal skills can be trained to play this game, we focus our attention to training the robot to perceive the game, and to interact in this game. Our multimodal deep reinforcement learning agent perceives multimodal features and exhibits verbal and non-verbal actions while playing. Experimental results using simulations show that the robot can learn to win or draw up to 98% of the games. A pilot test of the proposed multimodal system for the targeted game---integrating speech, vision and gestures---reports that reasonable and fluent interactions can be achieved using the proposed approach

University of Lincoln Institutional Repository

arXiv.org e-Print Archive

Strategic dialogue management via deep reinforcement learning

Author: Cuayahuitl Heriberto
Keizer Simon
Lemon Oliver
Publication venue: 'Center for Open Science'
Publication date: 01/01/2015
Field of study

Artificially intelligent agents equipped with strategic skills that can negotiate during their interactions with other natural or artificial agents are still underdeveloped. This paper describes a successful application of Deep Reinforcement Learning (DRL) for training intelligent agents with strategic conversational skills, in a situated dialogue setting. Previous studies have modelled the behaviour of strategic agents using supervised learning and traditional reinforcement learning techniques, the latter using tabular representations or learning with linear function approximation. In this study, we apply DRL with a high-dimensional state space to the strategic board game of Settlers of Catan---where players can offer resources in exchange for others and they can also reply to offers made by other players. Our experimental results report that the DRL-based learnt policies significantly outperformed several baselines including random, rule-based, and supervised-based behaviours. The DRL-based policy has a 53% win rate versus 3 automated players (`bots'), whereas a supervised player trained on a dialogue corpus in this setting achieved only 27%, versus the same 3 bots. This result supports the claim that DRL is a promising framework for training dialogue systems, and strategic agents with negotiation abilities

University of Lincoln Institutional Repository

arXiv.org e-Print Archive

Heriot Watt Pure

Scaling up deep reinforcement learning for multi-domain dialogue systems

Author: Carse Jacob
Cuayahuitl Heriberto
Williamson Ashley
Yu Seunghak
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2017
Field of study

Standard deep reinforcement learning methods such as Deep Q-Networks (DQN) for multiple tasks (domains) face scalability problems due to large search spaces. This paper proposes a three-stage method for multi-domain dialogue policy learning—termed NDQN, and applies it to an information-seeking spoken dialogue system in the domains of restaurants and hotels. In this method, the first stage does multi-policy learning via a network of DQN agents; the second makes use of compact state representations by compressing raw inputs; and the third stage applies a pre-training phase for bootstraping the behaviour of agents in the network. Experimental results comparing DQN (baseline) versus NDQN (proposed) using simulations report that the proposed method exhibits better scalability and is promising for optimising the behaviour of multi-domain dialogue systems. An additional evaluation reports that the NDQN agents outperformed a K-Nearest Neighbour baseline in task success and dialogue length, yielding more efficient and successful dialogues

University of Lincoln Institutional Repository

Deep reinforcement learning for multi-domain dialogue systems

Author: Carse Jacob
Cuayahuitl Heriberto
Williamson Ashley
Yu Seunghak
Publication venue: 'Center for Open Science'
Publication date: 26/11/2016
Field of study

Standard deep reinforcement learning methods such as Deep Q-Networks (DQN) for multiple tasks (domains) face scalability problems. We propose a method for multi-domain dialogue policy learning---termed NDQN, and apply it to an information-seeking spoken dialogue system in the domains of restaurants and hotels. Experimental results comparing DQN (baseline) versus NDQN (proposed) using simulations report that our proposed method exhibits better scalability and is promising for optimising the behaviour of multi-domain dialogue systems

University of Lincoln Institutional Repository

arXiv.org e-Print Archive

A Study on Dense and Sparse (Visual) Rewards in Robot Policy Learning

Author: Cuayahuitl Heriberto
Mohtasib Abdalkarim
Neumann Gerhard
Publication venue: 'University of Lincoln, School of Film and Media and Changer Agency'
Publication date: 08/09/2021
Field of study

Deep Reinforcement Learning (DRL) is a promising approach for teaching robots new behaviour. However, one of its main limitations is the need for carefully hand-coded reward signals by an expert. We argue that it is crucial to automate the reward learning process so that new skills can be taught to robots by their users. To address such automation, we consider task success classifiers using visual observations to estimate the rewards in terms of task success. In this work, we study the performance of multiple state-of-the-art deep reinforcement learning algorithms under different types of reward: Dense, Sparse, Visual Dense, and Visual Sparse rewards. Our experiments in various simulation tasks (Pendulum, Reacher, Pusher, and Fetch Reach) show that while DRL agents can learn successful behaviours using visual rewards when the goal targets are distinguishable, their performance may decrease if the task goal is not clearly visible. Our results also show that visual dense rewards are more successful than visual sparse rewards and that there is no single best algorithm for all tasks

University of Lincoln Institutional Repository

Reward-Based Environment States for Robot Manipulation Policy Learning

Author: Cuayahuitl Heriberto
Cédérick Mouliets
Ferrané Isabelle
Publication venue
Publication date: 08/12/2021
Field of study

Training robot manipulation policies is a challenging and open problem in robotics and artificial intelligence. In this paper we propose a novel and compact state representation based on the rewards predicted from an image-based task success classifier. Our experiments—using the Pepper robot in simulation with two deep reinforcement learning algorithms on a grab-and-lift task—reveal that our proposed state representation can achieve up to 97% task success using our best policies

University of Lincoln Institutional Repository

Towards augmenting dialogue strategy management with multimodal sub-symbolic context

Author: Baxter Paul
Belpaeme Tony
Cuayahuitl Heriberto
Kruijff-Korbayova Ivana
Wood Rachel
Publication venue
Publication date: 01/01/2012
Field of study

Abstract. A synthetic agent requires the coordinated use of multiple sensory and effector modalities in order to achieve a social human-robot interaction (HRI). While systems in which such a concatenation of multiple modalities exist, the issue of information coordination across modalities to identify relevant context information remains problematic. A system-wide information formalism is typically used to address the issue, which requires a re-encoding of all information into the system ontology. We propose a general approach to this information coordination issue, focussing particularly on a potential application to a dialogue strategy learning and selection system embedded within a wider architecture for social HRI. Rather than making use of a common system ontology, we rather emphasise a sub-symbolic association-driven architecture which has the capacity to influence the ‘internal ’ processing of all individual system modalities, without requiring the explicit processing or interpretation of modality-specific information

CiteSeerX

Ghent University Academic Bibliography