6 research outputs found
Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space
This paper delves into the problem of safe reinforcement learning (RL) in a
partially observable environment with the aim of achieving safe-reachability
objectives. In traditional partially observable Markov decision processes
(POMDP), ensuring safety typically involves estimating the belief in latent
states. However, accurately estimating an optimal Bayesian filter in POMDP to
infer latent states from observations in a continuous state space poses a
significant challenge, largely due to the intractable likelihood. To tackle
this issue, we propose a stochastic model-based approach that guarantees RL
safety almost surely in the face of unknown system dynamics and partial
observation environments. We leveraged the Predictive State Representation
(PSR) and Reproducing Kernel Hilbert Space (RKHS) to represent future
multi-step observations analytically, and the results in this context are
provable. Furthermore, we derived essential operators from the kernel Bayes'
rule, enabling the recursive estimation of future observations using various
operators. Under the assumption of \textit{undercompleness}, a polynomial
sample complexity is established for the RL algorithm for the infinite size of
observation and action spaces, ensuring an suboptimal safe policy
guarantee
Kernel Bayesian Inference with Posterior Regularization
Abstract We propose a vector-valued regression problem whose solution is equivalent to the reproducing kernel Hilbert space (RKHS) embedding of the Bayesian posterior distribution. This equivalence provides a new understanding of kernel Bayesian inference. Moreover, the optimization problem induces a new regularization for the posterior embedding estimator, which is faster and has comparable performance to the squared regularization in kernel Bayes' rule. This regularization coincides with a former thresholding approach used in kernel POMDPs whose consistency remains to be established. Our theoretical work solves this open problem and provides consistency analysis in regression settings. Based on our optimizational formulation, we propose a flexible Bayesian posterior regularization framework which for the first time enables us to put regularization at the distribution level. We apply this method to nonparametric state-space filtering tasks with extremely nonlinear dynamics and show performance gains over all other baselines