6,658 research outputs found
Efficient Reinforcement Learning Using Recursive Least-Squares Methods
The recursive least-squares (RLS) algorithm is one of the most well-known
algorithms used in adaptive filtering, system identification and adaptive
control. Its popularity is mainly due to its fast convergence speed, which is
considered to be optimal in practice. In this paper, RLS methods are used to
solve reinforcement learning problems, where two new reinforcement learning
algorithms using linear value function approximators are proposed and analyzed.
The two algorithms are called RLS-TD(lambda) and Fast-AHC (Fast Adaptive
Heuristic Critic), respectively. RLS-TD(lambda) can be viewed as the extension
of RLS-TD(0) from lambda=0 to general lambda within interval [0,1], so it is a
multi-step temporal-difference (TD) learning algorithm using RLS methods. The
convergence with probability one and the limit of convergence of RLS-TD(lambda)
are proved for ergodic Markov chains. Compared to the existing LS-TD(lambda)
algorithm, RLS-TD(lambda) has advantages in computation and is more suitable
for online learning. The effectiveness of RLS-TD(lambda) is analyzed and
verified by learning prediction experiments of Markov chains with a wide range
of parameter settings. The Fast-AHC algorithm is derived by applying the
proposed RLS-TD(lambda) algorithm in the critic network of the adaptive
heuristic critic method. Unlike conventional AHC algorithm, Fast-AHC makes use
of RLS methods to improve the learning-prediction efficiency in the critic.
Learning control experiments of the cart-pole balancing and the acrobot
swing-up problems are conducted to compare the data efficiency of Fast-AHC with
conventional AHC. From the experimental results, it is shown that the data
efficiency of learning control can also be improved by using RLS methods in the
learning-prediction process of the critic. The performance of Fast-AHC is also
compared with that of the AHC method using LS-TD(lambda). Furthermore, it is
demonstrated in the experiments that different initial values of the variance
matrix in RLS-TD(lambda) are required to get better performance not only in
learning prediction but also in learning control. The experimental results are
analyzed based on the existing theoretical work on the transient phase of
forgetting factor RLS methods
Kernelizing LSPE Ī»
We propose the use of kernel-based methods as underlying function approximator in the least-squares based policy evaluation framework of LSPE(Ī») and LSTD(Ī»). In particular we present the ākernelizationā of model-free LSPE(Ī»). The ākernelizationā is computationally made possible by using the subset of regressors approximation, which approximates the kernel using a vastly reduced number of basis functions. The core of our proposed solution is an efficient recursive implementation with automatic supervised selection of the relevant basis functions. The LSPE method is well-suited for optimistic policy iteration and can thus be used in the context of online reinforcement learning. We use the high-dimensional Octopus benchmark to demonstrate this
A Survey of Prediction and Classification Techniques in Multicore Processor Systems
In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems
- ā¦