533 research outputs found

    Intelligent Control of a Sensor-Actuator System via Kernelized Least-Squares Policy Iteration

    Get PDF
    In this paper a new framework, called Compressive Kernelized Reinforcement Learning (CKRL), for computing near-optimal policies in sequential decision making with uncertainty is proposed via incorporating the non-adaptive data-independent Random Projections and nonparametric Kernelized Least-squares Policy Iteration (KLSPI). Random Projections are a fast, non-adaptive dimensionality reduction framework in which high-dimensionality data is projected onto a random lower-dimension subspace via spherically random rotation and coordination sampling. KLSPI introduce kernel trick into the LSPI framework for Reinforcement Learning, often achieving faster convergence and providing automatic feature selection via various kernel sparsification approaches. In this approach, policies are computed in a low-dimensional subspace generated by projecting the high-dimensional features onto a set of random basis. We first show how Random Projections constitute an efficient sparsification technique and how our method often converges faster than regular LSPI, while at lower computational costs. Theoretical foundation underlying this approach is a fast approximation of Singular Value Decomposition (SVD). Finally, simulation results are exhibited on benchmark MDP domains, which confirm gains both in computation time and in performance in large feature spaces

    Kernelizing LSPE λ

    Get PDF
    We propose the use of kernel-based methods as underlying function approximator in the least-squares based policy evaluation framework of LSPE(λ) and LSTD(λ). In particular we present the ‘kernelization’ of model-free LSPE(λ). The ‘kernelization’ is computationally made possible by using the subset of regressors approximation, which approximates the kernel using a vastly reduced number of basis functions. The core of our proposed solution is an efficient recursive implementation with automatic supervised selection of the relevant basis functions. The LSPE method is well-suited for optimistic policy iteration and can thus be used in the context of online reinforcement learning. We use the high-dimensional Octopus benchmark to demonstrate this

    The Equi-Correlation Network: a New Kernelized-LARS with Automatic Kernel Parameters Tuning

    Get PDF
    Machine learning heavily relies on the ability to learn/approximate real functions. State variables, the perceptions, internal states, etc, of an agent are often represented as real numbers; grounded on them, the agent has to predict something, or act in some way. In this view, this outcome is a nonlinear function of the inputs. It is thus a very common task to fit a nonlinear function to observations, namely solving a regression problem. Among other approaches, the LARS is very appealing, for its nice theoretical properties, and actual efficiency to compute the whole l1l_1 regularization path of a supervised learning problem, along with the sparsity. In this paper, we consider the kernelized version of the LARS. In this setting, kernel functions generally have some parameters that have to be tuned. In this paper, we propose a new algorithm, the Equi-Correlation Network (ECON), which originality is that while computing the regularization path, ECON automatically tunes kernel hyper-parameters; thus, this opens the way to working with infinitely many kernel functions, from which, the most interesting are selected. Interestingly, our algorithm is still computationaly efficient, and provide state-of-the-art results on standard benchmarks, while lessening the hand-tuning burden

    Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding

    Full text link
    Optimal control is notoriously difficult for stochastic nonlinear systems. Ren et al. introduced Spectral Dynamics Embedding for developing reinforcement learning methods for controlling an unknown system. It uses an infinite-dimensional feature to linearly represent the state-value function and exploits finite-dimensional truncation approximation for practical implementation. However, the finite-dimensional approximation properties in control have not been investigated even when the model is known. In this paper, we provide a tractable stochastic nonlinear control algorithm that exploits the nonlinear dynamics upon the finite-dimensional feature approximation, Spectral Dynamics Embedding Control (SDEC), with an in-depth theoretical analysis to characterize the approximation error induced by the finite-dimension truncation and statistical error induced by finite-sample approximation in both policy evaluation and policy optimization. We also empirically test the algorithm and compare the performance with Koopman-based methods and iLQR methods on the pendulum swingup problem

    Kernelized Reinforcement Learning with Order Optimal Regret Bounds

    Full text link
    Reinforcement learning (RL) has shown empirical success in various real world settings with complex models and large state-action spaces. The existing analytical results, however, typically focus on settings with a small number of state-actions or simple models such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have considered nonlinear function approximation using kernel ridge regression. We propose π\pi-KRVI, an optimistic modification of least-squares value iteration, when the state-action value function is represented by an RKHS. We prove the first order-optimal regret guarantees under a general setting. Our results show a significant polynomial in the number of episodes improvement over the state of the art. In particular, with highly non-smooth kernels (such as Neural Tangent kernel or some Mat\'ern kernels) the existing results lead to trivial (superlinear in the number of episodes) regret bounds. We show a sublinear regret bound that is order optimal in the case of Mat\'ern kernels where a lower bound on regret is known
    corecore