32 research outputs found

    Dialogue management using reinforcement learning

    Get PDF
    Dialogue has been widely used for verbal communication between human and robot interaction, such as assistant robot in hospital. However, this robot was usually limited by predetermined dialogue, so it will be difficult to understand new words for new desired goal. In this paper, we discussed conversation in Indonesian on entertainment, motivation, emergency, and helping with knowledge growing method. We provided mp3 audio for music, fairy tale, comedy request, and motivation. The execution time for this request was 3.74 ms on average. In emergency situation, patient able to ask robot to call the nurse. Robot will record complaint of pain and inform nurse. From 7 emergency reports, all complaints were successfully saved on database. In helping conversation, robot will walk to pick up belongings of patient. Once the robot did not understand with patient’s conversation, robot will ask until it understands. From asking conversation, knowledge expands from 2 to 10, with learning execution from 1405 ms to 3490 ms. SARSA was faster towards steady state because of higher cumulative rewards. Q-learning and SARSA were achieved desired object within 200 episodes. It concludes that RL method to overcome robot knowledge limitation in achieving new dialogue goal for patient assistant were achieved

    Difference of Convex Functions Programming Applied to Control with Expert Data

    Get PDF
    This paper reports applications of Difference of Convex functions (DC) programming to Learning from Demonstrations (LfD) and Reinforcement Learning (RL) with expert data. This is made possible because the norm of the Optimal Bellman Residual (OBR), which is at the heart of many RL and LfD algorithms, is DC. Improvement in performance is demonstrated on two specific algorithms, namely Reward-regularized Classification for Apprenticeship Learning (RCAL) and Reinforcement Learning with Expert Demonstrations (RLED), through experiments on generic Markov Decision Processes (MDP), called Garnets

    Deep Reinforcement Learning with Feedback-based Exploration

    Full text link
    Deep Reinforcement Learning has enabled the control of increasingly complex and high-dimensional problems. However, the need of vast amounts of data before reasonable performance is attained prevents its widespread application. We employ binary corrective feedback as a general and intuitive manner to incorporate human intuition and domain knowledge in model-free machine learning. The uncertainty in the policy and the corrective feedback is combined directly in the action space as probabilistic conditional exploration. As a result, the greatest part of the otherwise ignorant learning process can be avoided. We demonstrate the proposed method, Predictive Probabilistic Merging of Policies (PPMP), in combination with DDPG. In experiments on continuous control problems of the OpenAI Gym, we achieve drastic improvements in sample efficiency, final performance, and robustness to erroneous feedback, both for human and synthetic feedback. Additionally, we show solutions beyond the demonstrated knowledge.Comment: 6 page
    corecore