7 research outputs found

    THE SOFT MULTI-LEGGED ROBOT TAOYAKA-S II : ABSTRACTION OF THE STATE-ACTION SPACE OF REINFORCEMENT LEARNING USING THE PHYSICAL PROPERTIES OF A SOFT ROBOT

    Get PDF
    This paper considers the abstraction of the state-action space of reinforcement learning using the physical properties of a soft body. In general, soft robots can adapt to complex environments owing to their flexibility. This adaptability is utilized for abstracting the state-action space. The policy acquired using the abstraction was found to have generality, and to greatly reduce the size of the state-action space. The proposed framework was applied to the soft multi-legged robot TAOYAKA-S II, and demonstrated that the robot could easily acquire an effective policy moving within a given environment. Experiments were conducted to demonstrate climbing motion over a pipe and walking motion over a flat surface. The proposed framework made the policy applicable to other columnar objects without requiring additional learning

    Reinforcement learning strategies using Monte-Carlo to solve the blackjack problem

    Get PDF
    Blackjack is a classic casino game in which the player attempts to outsmart the dealer by drawing a combination of cards with face values that add up to just under or equal to 21 but are more incredible than the hand of the dealer he manages to come up with. This study considers a simplified variation of blackjack, which has a dealer and plays no active role after the first two draws. A different game regime will be modeled for everyone to ten multiples of the conventional 52-card deck. Irrespective of the number of standard decks utilized, the game is played as a randomized discrete-time process. For determining the optimum course of action in terms of policy, we teach an agent-a decision maker-to optimize across the decision space of the game, considering the procedure as a finite Markov decision chain. To choose the most effective course of action, we mainly research Monte Carlo-based reinforcement learning approaches and compare them with q-learning, dynamic programming, and temporal difference. The performance of the distinct model-free policy iteration techniques is presented in this study, framing the game as a reinforcement learning problem

    Deep Reinforcement Learning: An Overview

    Full text link
    In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.Comment: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 201

    Bayesian RL in factored POMDPs

    Get PDF
    Robust decision-making agents in any non-trivial system must reason over uncertainty of various types such as action outcomes, the agent's current state and the dynamics of the environment. The outcome and state un- certainty are elegantly captured by the Partially Observable Markov Decision Processes (POMDP) framework [1], which enable reasoning in stochastic, par- tially observable environments. POMDP solution methods, however, typically assume complete access to the system dynamics, which unfortunately are often not available. When such a model is not available, model-based Bayesian Re- inforcement Learning (BRL) methods explicitly maintain a posterior over the possible models of the environment, and use this knowledge to select actions that, theoretically, trade o_ exploration and exploitation optimally. However, few of the BRL methods are applicable to partial observable settings, and those that are, have limited scaling properties. The Bayes-Adaptive POMDP (BA- POMDP) [4], for example, models the environment in a tabular fashion, which poses a bottleneck for scalability. Here, we describe previous work [3] that pro- poses a method to overcome this bottleneck by representing the dynamics with Bayes Network, an approach that exploits structure in the form of independence between state and observation features.Interactive Intelligenc

    Recommending messages to users in participatory media environments: a Bayesian credibility approach

    Get PDF
    In this thesis, we address the challenge of information overload in online participatory messaging environments using an artificial intelligence approach drawn from research in multiagent systems trust modeling. In particular, we reason about which messages to show to users based on modeling both credibility and similarity, motivated by a need to discriminate between (false) popular and truly beneficial messages. Our work focuses on environments wherein users' ratings on messages reveal their preferences and where the trustworthiness of those ratings then needs to be modeled, in order to make effective recommendations. We first present one solution, CredTrust, and demonstrate its efficacy in comparison with LOAR --- an established trust-based recommender system applicable to participatory media networks which fails to incorporate the modeling of credibility. Validation for our framework is provided through the simulation of an environment where the ground truth of the benefit of a message to a user is known. We are able to show that our approach performs well in terms of successfully recommending those messages with high predicted benefit and avoiding those messages with low predicted benefit. We continue by developing a new model for making recommendations that is grounded in Bayesian statistics and uses Partially Observable Markov Decision Processes (POMDPs). This model is an important next step, as both CredTrust and LOAR encode particular functions of user features (viz., similarity and credibility) when making recommendations; our new model, denoted POMDPTrust, learns the appropriate evaluation functions in order to make ``correct" belief updates about the usefulness of messages. We validate our new approach in simulation, showing that it outperforms both LOAR and CredTrust in a variety of agent scenarios. Furthermore, we demonstrate how POMDPTrust performs well against real world data sets from Reddit.com and Epinions.com. In all, we offer a novel trust model which is shown, through simulation and real-world experimentation, to be an effective agent-based solution to the problem of managing the messages posted by users in participatory media networks

    Monte Carlo Tree Search for Bayesian Reinforcement Learning

    No full text
    corecore