Search CORE

36,463 research outputs found

Q Learning Behavior on Autonomous Navigation of Physical Robot

Author: Wicaksono Handy
Publication venue
Publication date: 01/11/2011
Field of study

Behavior based architecture gives robot fast and reliable action. If there are many behaviors in robot, behavior coordination is needed. Subsumption architecture is behavior coordination method that give quick and robust response. Learning mechanism improve robotâ€™s performance in handling uncertainty. Q learning is popular reinforcement learning method that has been used in robot learning because it is simple, convergent and off policy. In this paper, Q learning will be used as learning mechanism for obstacle avoidance behavior in autonomous robot navigation. Learning rate of Q learning affect robotâ€™s performance in learning phase. As the result, Q learning algorithm is successfully implemented in a physical robot with its imperfect environment

Crossref

Scientific Repository

Q-learning with censored data

Author: Goldberg Yair
Kosorok Michael R.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

We develop methodology for a multistage decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the optimal Q-function belongs to the approximation space, the expected survival time for policies obtained by the algorithm converges to that of the optimal policy. We simulate a multistage clinical trial with flexible number of stages and apply the proposed censored-Q-learning algorithm to find individualized treatment regimens. The methodology presented in this paper has implications in the design of personalized medicine trials in cancer and in other life-threatening diseases.Comment: Published in at http://dx.doi.org/10.1214/12-AOS968 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

PubMed Central

Carolina Digital Repository

Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning

Author: Gao Jianfeng
Li Xiujun
Liu Jingjing
Wu Yuexin
Yang Yiming
Publication venue
Publication date: 19/11/2018
Field of study

Training task-completion dialogue agents with reinforcement learning usually requires a large number of real user experiences. The Dyna-Q algorithm extends Q-learning by integrating a world model, and thus can effectively boost training efficiency using simulated experiences generated by the world model. The effectiveness of Dyna-Q, however, depends on the quality of the world model - or implicitly, the pre-specified ratio of real vs. simulated experiences used for Q-learning. To this end, we extend the recently proposed Deep Dyna-Q (DDQ) framework by integrating a switcher that automatically determines whether to use a real or simulated experience for Q-learning. Furthermore, we explore the use of active learning for improving sample efficiency, by encouraging the world model to generate simulated experiences in the state-action space where the agent has not (fully) explored. Our results show that by combining switcher and active learning, the new framework named as Switch-based Active Deep Dyna-Q (Switch-DDQ), leads to significant improvement over DDQ and Q-learning baselines in both simulation and human evaluations.Comment: 8 pages, 9 figures, AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Theoretical Analysis of Cooperative Behavior in Multi-Agent Q-learning

Author: Kaymak U.
Waltman L.R.
Publication venue
Publication date
Field of study

A number of experimental studies have investigated whether cooperative behavior may emerge in multi-agent Q-learning. In some studies cooperative behavior did emerge, in others it did not. This report provides a theoretical analysis of this issue. The analysis focuses on multi-agent Q-learning in iterated prisonerâ€™s dilemmas. It is shown that under certain assumptions cooperative behavior may emerge when multi-agent Q-learning is applied in an iterated prisonerâ€™s dilemma. An important consequence of the analysis is that multi-agent Q-learning may result in non-Nash behavior. It is found experimentally that the theoretical results derived in this report are quite robust to violations of the underlying assumptions.Cooperation;Multi-Agent Q-Learning;Multi-Agent Reinforcement Learning;Nash Equilibrium;Prisonerâ€™s Dilemma

Research Papers in Economics

BEHAVIOR BASED CONTROL AND FUZZY Q-LEARNING FOR AUTONOMOUS FIVE LEGS ROBOT NAVIGATION

Author: Adil Ratna
Publication venue
Publication date: 01/01/2009
Field of study

This paper presents collaboration of behavior based control and fuzzy Q-learning for five legs robot navigation systems. There are many fuzzy Q-learning algorithms that have been proposed to yield individual behavior like obstacle avoidance, find target and so on. However, for complicated tasks, it is needed to combine all behaviors in one control schema using behavior based control. Based this fact, this paper proposes a control schema that incorporate fuzzy q-learning in behavior based schema to overcome complicated tasks in navigation systems of autonomous five legs robot. In the proposed schema, there are two behaviors which is learned by fuzzy q-learning. Other behaviors is constructed in design step. All behaviors are coordinated by hierarchical hybrid coordination node. Simulation results demonstrate that the robot with proposed schema is able to learn the right policy, to avoid obstacle and to find the target. However, Fuzzy q-learning failed to give right policy for the robot to avoid collision in the corner location. Keywords : behavior based control, fuzzy q-learnin

PENS Repository

Q-learning with Nearest Neighbors

Author: Shah Devavrat
Xie Qiaomin
Publication venue
Publication date: 22/10/2018
Field of study

We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a

d

-dimensional state space and the discounted factor

\gamma \in (0,1)

, given an arbitrary sample path with "covering time"

L

, we establish that the algorithm is guaranteed to output an

\varepsilon

-accurate estimate of the optimal Q-function using

\tilde{O}\big(L/(\varepsilon^3(1-\gamma)^7)\big)

samples. For instance, for a well-behaved MDP, the covering time of the sample path under the purely random policy scales as

\tilde{O}\big(1/\varepsilon^d\big),

so the sample complexity scales as

\tilde{O}\big(1/\varepsilon^{d+3}\big).

Indeed, we establish a lower bound that argues that the dependence of

\tilde{\Omega}\big(1/\varepsilon^{d+2}\big)

is necessary.Comment: Accepted to NIPS 201

arXiv.org e-Print Archive

DSpace@MIT