43,206 research outputs found

    VIRTUAL ROBOT LOCOMOTION ON VARIABLE TERRAIN WITH ADVERSARIAL REINFORCEMENT LEARNING

    Get PDF
    Reinforcement Learning (RL) is a machine learning technique where an agent learns to perform a complex action by going through a repeated process of trial and error to maximize a well-defined reward function. This form of learning has found applications in robot locomotion where it has been used to teach robots to traverse complex terrain. While RL algorithms may work well in training robot locomotion, they tend to not generalize well when the agent is brought into an environment that it has never encountered before. Possible solutions from the literature include training a destabilizing adversary alongside the locomotive learning agent. The destabilizing adversary aims to destabilize the agent by applying external forces to it, which may help the locomotive agent learn to deal with unexpected scenarios. For this project, we will train a robust, simulated quadruped robot to traverse a variable terrain. We compare and analyze Proximal Policy Optimization (PPO) with and without the use of an adversarial agent, and determine which use of PPO produces the best results

    Minimizing Human Assistance: Augmenting a Single Demonstration for Deep Reinforcement Learning

    Full text link
    The use of human demonstrations in reinforcement learning has proven to significantly improve agent performance. However, any requirement for a human to manually 'teach' the model is somewhat antithetical to the goals of reinforcement learning. This paper attempts to minimize human involvement in the learning process while still retaining the performance advantages by using a single human example collected through a simple-to-use virtual reality simulation to assist with RL training. Our method augments a single demonstration to generate numerous human-like demonstrations that, when combined with Deep Deterministic Policy Gradients and Hindsight Experience Replay (DDPG + HER), significantly improve training time on simple tasks and allows the agent to solve a complex task (block stacking) that DDPG + HER alone cannot solve. The model achieves this significant training advantage using a single human example, requiring less than a minute of human input.Comment: 7 pages, 11 figure

    A REAL-TIME PERSONALISED RECOMMENDER SYSTEM FRAMEWORK FOR ONLINE LEARNING PLATFORMS

    Get PDF
    Long-tail concerns affect traditional recommender systems. They often recommend identical things, limiting the options available to users. Conventional recommender systems also suffer from lack of real-timeliness. In this work, a recommender system framework for online learning platform is proposed using deep reinforcement learning algorithm. The agent takes action by recommending learning materials to the learners based on the interactions of the recommender agent with the learner. Positive reinforcement (positive reward such as likes, longer dwell time, clicks, etc.) and negative reinforcement (punishment such as dislikes, less dwell time, skips, etc.) are used to teach the recommender agent what to recommend. This enables the agent to iteratively refine its policy via interactivities with the environment, using trial-and-error methods, until the model conforms to an ideal policy that produces suggestions that are most suitable for the users’ dynamic preferences. The outcomes of the deep reinforcement learning agent were benchmarked against the performance of a random agent using evaluation metrics such as average episode reward, click through rate, average quality of recommendation and standard deviation of episode reward. The study shows that the average episode reward, click through rate, average quality of recommendation for the DRL agent increased by 2.72, 1.5 and 16.20 percent respectively, while the standard deviation of episode reward for the DRL agent reduced by 20.61 percent. All these are positive indicators of the better performance of the DRL agent

    Deep imitation learning for 3D navigation tasks

    Get PDF
    Deep learning techniques have shown success in learning from raw high dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: Deep-Q-networks (DQN) and Asynchronous actor critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an e�ective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples

    Reinforcement learning strategies using Monte-Carlo to solve the blackjack problem

    Get PDF
    Blackjack is a classic casino game in which the player attempts to outsmart the dealer by drawing a combination of cards with face values that add up to just under or equal to 21 but are more incredible than the hand of the dealer he manages to come up with. This study considers a simplified variation of blackjack, which has a dealer and plays no active role after the first two draws. A different game regime will be modeled for everyone to ten multiples of the conventional 52-card deck. Irrespective of the number of standard decks utilized, the game is played as a randomized discrete-time process. For determining the optimum course of action in terms of policy, we teach an agent-a decision maker-to optimize across the decision space of the game, considering the procedure as a finite Markov decision chain. To choose the most effective course of action, we mainly research Monte Carlo-based reinforcement learning approaches and compare them with q-learning, dynamic programming, and temporal difference. The performance of the distinct model-free policy iteration techniques is presented in this study, framing the game as a reinforcement learning problem

    Intelligens ágens alapú humanoid robot vezérlés: Intelligent agent based control of humanoid robot

    Get PDF
    The subject of the research is to investigate ways to teach an intelligent agent to be able to control a customly designed and built humanoid robot, that has 26 revolute joints. The goal to achieve for the agent, in terms of control is to be able to keep itself balanced, furthermore to learn to develop anecient walking gait, through sequences of trials. The learning method is based on reinforcement learning, for the actual learning the three dimensional mechanical model was used, paired with a simulation environment specifically designed for this application. Kivonat A kutatás témája, annak vizsgálata, hogy milyen módon tudunk egy intelligens ágenst megtanítani egy saját tervezésű, huszonhat forgásponttal rendelkező humanoid robot vezérlésére. A munka célja annak elérése, hogy az ágens próbálkozások során tanulja meg, hogy milyen módon képes az egyensúlya fenntartására, illetve, hogy milyen mozgási szekvenciák révén tud hatékonyan, egyenesvonalban mozogni. A tanítás, a megerősítő tanulási módszer alapján valósult meg, a vezérelni kívánt robot háromdimenziós mechanikai modellje és egy erre a célra kialakított szimulációs környezet révén
    corecore