Thesis (Ph.D.), Computer Science, Washington State UniversityTransfer learning is a method in machine learning that tries to use previous training
knowledge to speed up the learning process. Policy advice is a type of transfer learning
method where a student agent is able to learn faster via advice from a teacher
agent. Here, the agent who provides advice (actions) is called the teacher agent. The
agent who receives advice (actions) is the student agent. However, both this and
other current reinforcement learning transfer methods have little theoretical analysis.
This dissertation formally denes a setting where multiple teacher agents can provide
advice to a student and introduces an algorithm to leverage both autonomous exploration
and the teacher's advice. Regret bounds are provided and negative transfer is
formally dened and studied.
On the other hand, policy search is a class of reinforcement learning algorithms
for nding optimal policies to control problems with limited feedback. These methods
have shown successful applications in high-dimensional problems, such as robotics
control. Though successful, current methods can lead to unsafe policy parameters
damaging hardware units. Motivated by such constraints, Bhatnagar et al. and others
proposed projection based methods for safe policies [8]. These methods, however,
can only handle convex policy constraints. In this dissertation, we contribute the
rst safe policy search reinforcement learner capable of operating under non-convex
policy constraints. This is achieved by observing a connection between non-convex
variational inequalities and policy search problems. We provide two algorithms, i.e.,
Mann and two-step iteration, to solve the above and prove convergence in the nonconvex
stochastic setting.
Lastly, lifelong reinforcement learning is a framework similar to transfer learning
that allows agents to learn multiple consecutive tasks sequentially online. Current
methods, however, suer from scalability issues when the agent has to solve a large
number of tasks. In this dissertation, we remedy the above drawbacks and propose
a novel scalable technique for lifelong reinforcement learning. We derive an algorithm
which assumes the availability of multiple processing units and computes shared
repositories and local policies using only local information exchange.Washington State University, Computer Scienc
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.