533 research outputs found
Intelligent Control of a Sensor-Actuator System via Kernelized Least-Squares Policy Iteration
In this paper a new framework, called Compressive Kernelized Reinforcement Learning (CKRL), for computing near-optimal policies in sequential decision making with uncertainty is proposed via incorporating the non-adaptive data-independent Random Projections and nonparametric Kernelized Least-squares Policy Iteration (KLSPI). Random Projections are a fast, non-adaptive dimensionality reduction framework in which high-dimensionality data is projected onto a random lower-dimension subspace via spherically random rotation and coordination sampling. KLSPI introduce kernel trick into the LSPI framework for Reinforcement Learning, often achieving faster convergence and providing automatic feature selection via various kernel sparsification approaches. In this approach, policies are computed in a low-dimensional subspace generated by projecting the high-dimensional features onto a set of random basis. We first show how Random Projections constitute an efficient sparsification technique and how our method often converges faster than regular LSPI, while at lower computational costs. Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD). Finally, simulation results are exhibited on benchmark MDP domains, which confirm gains both in computation time and in performance in large feature spaces
Kernelizing LSPE λ
We propose the use of kernel-based methods as underlying function approximator in the least-squares based policy evaluation framework of LSPE(λ) and LSTD(λ). In particular we present the ‘kernelization’ of model-free LSPE(λ). The ‘kernelization’ is computationally made possible by using the subset of regressors approximation, which approximates the kernel using a vastly reduced number of basis functions. The core of our proposed solution is an efficient recursive implementation with automatic supervised selection of the relevant basis functions. The LSPE method is well-suited for optimistic policy iteration and can thus be used in the context of online reinforcement learning. We use the high-dimensional Octopus benchmark to demonstrate this
The Equi-Correlation Network: a New Kernelized-LARS with Automatic Kernel Parameters Tuning
Machine learning heavily relies on the ability to learn/approximate real functions. State variables, the perceptions, internal states, etc, of an agent are often represented as real numbers; grounded on them, the agent has to predict something, or act in some way. In this view, this outcome is a nonlinear function of the inputs. It is thus a very common task to fit a nonlinear function to observations, namely solving a regression problem. Among other approaches, the LARS is very appealing, for its nice theoretical properties, and actual efficiency to compute the whole regularization path of a supervised learning problem, along with the sparsity. In this paper, we consider the kernelized version of the LARS. In this setting, kernel functions generally have some parameters that have to be tuned. In this paper, we propose a new algorithm, the Equi-Correlation Network (ECON), which originality is that while computing the regularization path, ECON automatically tunes kernel hyper-parameters; thus, this opens the way to working with infinitely many kernel functions, from which, the most interesting are selected. Interestingly, our algorithm is still computationaly efficient, and provide state-of-the-art results on standard benchmarks, while lessening the hand-tuning burden
Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding
Optimal control is notoriously difficult for stochastic nonlinear systems.
Ren et al. introduced Spectral Dynamics Embedding for developing reinforcement
learning methods for controlling an unknown system. It uses an
infinite-dimensional feature to linearly represent the state-value function and
exploits finite-dimensional truncation approximation for practical
implementation. However, the finite-dimensional approximation properties in
control have not been investigated even when the model is known. In this paper,
we provide a tractable stochastic nonlinear control algorithm that exploits the
nonlinear dynamics upon the finite-dimensional feature approximation, Spectral
Dynamics Embedding Control (SDEC), with an in-depth theoretical analysis to
characterize the approximation error induced by the finite-dimension truncation
and statistical error induced by finite-sample approximation in both policy
evaluation and policy optimization. We also empirically test the algorithm and
compare the performance with Koopman-based methods and iLQR methods on the
pendulum swingup problem
Kernelized Reinforcement Learning with Order Optimal Regret Bounds
Reinforcement learning (RL) has shown empirical success in various real world
settings with complex models and large state-action spaces. The existing
analytical results, however, typically focus on settings with a small number of
state-actions or simple models such as linearly modeled state-action value
functions. To derive RL policies that efficiently handle large state-action
spaces with more general value functions, some recent works have considered
nonlinear function approximation using kernel ridge regression. We propose
-KRVI, an optimistic modification of least-squares value iteration, when
the state-action value function is represented by an RKHS. We prove the first
order-optimal regret guarantees under a general setting. Our results show a
significant polynomial in the number of episodes improvement over the state of
the art. In particular, with highly non-smooth kernels (such as Neural Tangent
kernel or some Mat\'ern kernels) the existing results lead to trivial
(superlinear in the number of episodes) regret bounds. We show a sublinear
regret bound that is order optimal in the case of Mat\'ern kernels where a
lower bound on regret is known
- …