Search CORE

7 research outputs found

THE SOFT MULTI-LEGGED ROBOT TAOYAKA-S II : ABSTRACTION OF THE STATE-ACTION SPACE OF REINFORCEMENT LEARNING USING THE PHYSICAL PROPERTIES OF A SOFT ROBOT

Author: 本間義大
Publication venue: 法政大学大学院理工学研究科
Publication date: 31/03/2019
Field of study

This paper considers the abstraction of the state-action space of reinforcement learning using the physical properties of a soft body. In general, soft robots can adapt to complex environments owing to their flexibility. This adaptability is utilized for abstracting the state-action space. The policy acquired using the abstraction was found to have generality, and to greatly reduce the size of the state-action space. The proposed framework was applied to the soft multi-legged robot TAOYAKA-S II, and demonstrated that the robot could easily acquire an effective policy moving within a given environment. Experiments were conducted to demonstrate climbing motion over a pipe and walking motion over a flat surface. The proposed framework made the policy applicable to other columnar objects without requiring additional learning

Hosei University Repository

Reinforcement learning strategies using Monte-Carlo to solve the blackjack problem

Author: Biju Vinai George
Channegowda Ravikumar Hodikehosahally
Jankatti Santosh Kumar
Jinachandra Niranjana Shravanabelagola
Srinivasaiah Raghavendra
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/02/2024
Field of study

Blackjack is a classic casino game in which the player attempts to outsmart the dealer by drawing a combination of cards with face values that add up to just under or equal to 21 but are more incredible than the hand of the dealer he manages to come up with. This study considers a simplified variation of blackjack, which has a dealer and plays no active role after the first two draws. A different game regime will be modeled for everyone to ten multiples of the conventional 52-card deck. Irrespective of the number of standard decks utilized, the game is played as a randomized discrete-time process. For determining the optimum course of action in terms of policy, we teach an agent-a decision maker-to optimize across the decision space of the game, considering the procedure as a finite Markov decision chain. To choose the most effective course of action, we mainly research Monte Carlo-based reinforcement learning approaches and compare them with q-learning, dynamic programming, and temporal difference. The performance of the distinct model-free policy iteration techniques is presented in this study, framing the game as a reinforcement learning problem

Institute of Advanced Engineering and Science

Deep Reinforcement Learning: An Overview

Author: AG Barto
D Ormoneit
F Sehnke
G Tesauro
H-G Beyer
J Kober
J Schmidhuber
LP Kaelbling
MG Bellemare
P Vincent
RS Sutton
S Hochreiter
SS Mousavi
V Mnih
W Böhmer
Y Bengio
Y Bengio
Y Bengio
Y Lecun
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/06/2018
Field of study

In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.Comment: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 201

arXiv.org e-Print Archive

Crossref

Bayesian RL in factored POMDPs

Author: Amato C
Katt S
Oliehoek F
Publication venue
Publication date: 01/01/2019
Field of study

Robust decision-making agents in any non-trivial system must reason over uncertainty of various types such as action outcomes, the agent's current state and the dynamics of the environment. The outcome and state un- certainty are elegantly captured by the Partially Observable Markov Decision Processes (POMDP) framework [1], which enable reasoning in stochastic, par- tially observable environments. POMDP solution methods, however, typically assume complete access to the system dynamics, which unfortunately are often not available. When such a model is not available, model-based Bayesian Re- inforcement Learning (BRL) methods explicitly maintain a posterior over the possible models of the environment, and use this knowledge to select actions that, theoretically, trade o_ exploration and exploitation optimally. However, few of the BRL methods are applicable to partial observable settings, and those that are, have limited scaling properties. The Bayes-Adaptive POMDP (BA- POMDP) [4], for example, models the environment in a tabular fashion, which poses a bottleneck for scalability. Here, we describe previous work [3] that pro- poses a method to overcome this bottleneck by representing the dynamics with Bayes Network, an approach that exploits structure in the form of independence between state and observation features.Interactive Intelligenc

University of Liverpool Repository

TU Delft Repository

Recommending messages to users in participatory media environments: a Bayesian credibility approach

Author: Sardana Noel
Publication venue: 'University of Waterloo'
Publication date: 07/04/2014
Field of study

In this thesis, we address the challenge of information overload in online participatory messaging environments using an artificial intelligence approach drawn from research in multiagent systems trust modeling. In particular, we reason about which messages to show to users based on modeling both credibility and similarity, motivated by a need to discriminate between (false) popular and truly beneficial messages. Our work focuses on environments wherein users' ratings on messages reveal their preferences and where the trustworthiness of those ratings then needs to be modeled, in order to make effective recommendations. We first present one solution, CredTrust, and demonstrate its efficacy in comparison with LOAR --- an established trust-based recommender system applicable to participatory media networks which fails to incorporate the modeling of credibility. Validation for our framework is provided through the simulation of an environment where the ground truth of the benefit of a message to a user is known. We are able to show that our approach performs well in terms of successfully recommending those messages with high predicted benefit and avoiding those messages with low predicted benefit. We continue by developing a new model for making recommendations that is grounded in Bayesian statistics and uses Partially Observable Markov Decision Processes (POMDPs). This model is an important next step, as both CredTrust and LOAR encode particular functions of user features (viz., similarity and credibility) when making recommendations; our new model, denoted POMDPTrust, learns the appropriate evaluation functions in order to make ``correct" belief updates about the usefulness of messages. We validate our new approach in simulation, showing that it outperforms both LOAR and CredTrust in a variety of agent scenarios. Furthermore, we demonstrate how POMDPTrust performs well against real world data sets from Reddit.com and Epinions.com. In all, we offer a novel trust model which is shown, through simulation and real-world experimentation, to be an effective agent-based solution to the problem of managing the messages posted by users in participatory media networks

University of Waterloo's Institutional Repository

Monte Carlo Tree Search for Bayesian Reinforcement Learning

Author: Ertel Wolfgang
Ngo Vien
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Queen's University Belfast Research Portal

Crossref

Monte-Carlo tree search for Bayesian reinforcement learning

Author: Chung TaeChoong
Dang Viet-Hung
Ertel Wolfgang
Ngo Vien
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Queen's University Belfast Research Portal