Skip to main content
Article thumbnail
Location of Repository

POLICY ADVICE, NON-CONVEX AND DISTRIBUTED OPTIMIZATION IN REINFORCEMENT LEARNING

By Yusen Zhan

Abstract

Thesis (Ph.D.), Computer Science, Washington State UniversityTransfer learning is a method in machine learning that tries to use previous training knowledge to speed up the learning process. Policy advice is a type of transfer learning method where a student agent is able to learn faster via advice from a teacher agent. Here, the agent who provides advice (actions) is called the teacher agent. The agent who receives advice (actions) is the student agent. However, both this and other current reinforcement learning transfer methods have little theoretical analysis. This dissertation formally denes a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and the teacher's advice. Regret bounds are provided and negative transfer is formally dened and studied. On the other hand, policy search is a class of reinforcement learning algorithms for nding optimal policies to control problems with limited feedback. These methods have shown successful applications in high-dimensional problems, such as robotics control. Though successful, current methods can lead to unsafe policy parameters damaging hardware units. Motivated by such constraints, Bhatnagar et al. and others proposed projection based methods for safe policies [8]. These methods, however, can only handle convex policy constraints. In this dissertation, we contribute the rst safe policy search reinforcement learner capable of operating under non-convex policy constraints. This is achieved by observing a connection between non-convex variational inequalities and policy search problems. We provide two algorithms, i.e., Mann and two-step iteration, to solve the above and prove convergence in the nonconvex stochastic setting. Lastly, lifelong reinforcement learning is a framework similar to transfer learning that allows agents to learn multiple consecutive tasks sequentially online. Current methods, however, suer from scalability issues when the agent has to solve a large number of tasks. In this dissertation, we remedy the above drawbacks and propose a novel scalable technique for lifelong reinforcement learning. We derive an algorithm which assumes the availability of multiple processing units and computes shared repositories and local policies using only local information exchange.Washington State University, Computer Scienc

Topics: Computer science, Artificial intelligence, Maching Leanring, Non-convex Optimization, Reinforcement Learning, Transfer Learning
Year: 2016
OAI identifier: oai:research.libraries.wsu.edu:2376/12014
Provided by: Research Exchange
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://hdl.handle.net/2376/120... (external link)
  • http://purl.org/eprint/accessR... (external link)
  • http://www.ndltd.org/standards... (external link)
  • http://rightsstatements.org/vo... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.