8,814 research outputs found
VIME: Variational Information Maximizing Exploration
Scalable and effective exploration remains a key challenge in reinforcement
learning (RL). While there are methods with optimality guarantees in the
setting of discrete state and action spaces, these methods cannot be applied in
high-dimensional deep RL scenarios. As such, most contemporary RL relies on
simple heuristics such as epsilon-greedy exploration or adding Gaussian noise
to the controls. This paper introduces Variational Information Maximizing
Exploration (VIME), an exploration strategy based on maximization of
information gain about the agent's belief of environment dynamics. We propose a
practical implementation, using variational inference in Bayesian neural
networks which efficiently handles continuous state and action spaces. VIME
modifies the MDP reward function, and can be applied with several different
underlying RL algorithms. We demonstrate that VIME achieves significantly
better performance compared to heuristic exploration methods across a variety
of continuous control tasks and algorithms, including tasks with very sparse
rewards.Comment: Published in Advances in Neural Information Processing Systems 29
(NIPS), pages 1109-111
Adaptive stochastic Galerkin FEM for lognormal coefficients in hierarchical tensor representations
Stochastic Galerkin methods for non-affine coefficient representations are
known to cause major difficulties from theoretical and numerical points of
view. In this work, an adaptive Galerkin FE method for linear parametric PDEs
with lognormal coefficients discretized in Hermite chaos polynomials is
derived. It employs problem-adapted function spaces to ensure solvability of
the variational formulation. The inherently high computational complexity of
the parametric operator is made tractable by using hierarchical tensor
representations. For this, a new tensor train format of the lognormal
coefficient is derived and verified numerically. The central novelty is the
derivation of a reliable residual-based a posteriori error estimator. This can
be regarded as a unique feature of stochastic Galerkin methods. It allows for
an adaptive algorithm to steer the refinements of the physical mesh and the
anisotropic Wiener chaos polynomial degrees. For the evaluation of the error
estimator to become feasible, a numerically efficient tensor format
discretization is developed. Benchmark examples with unbounded lognormal
coefficient fields illustrate the performance of the proposed Galerkin
discretization and the fully adaptive algorithm
- …