8,814 research outputs found

    VIME: Variational Information Maximizing Exploration

    Full text link
    Scalable and effective exploration remains a key challenge in reinforcement learning (RL). While there are methods with optimality guarantees in the setting of discrete state and action spaces, these methods cannot be applied in high-dimensional deep RL scenarios. As such, most contemporary RL relies on simple heuristics such as epsilon-greedy exploration or adding Gaussian noise to the controls. This paper introduces Variational Information Maximizing Exploration (VIME), an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics. We propose a practical implementation, using variational inference in Bayesian neural networks which efficiently handles continuous state and action spaces. VIME modifies the MDP reward function, and can be applied with several different underlying RL algorithms. We demonstrate that VIME achieves significantly better performance compared to heuristic exploration methods across a variety of continuous control tasks and algorithms, including tasks with very sparse rewards.Comment: Published in Advances in Neural Information Processing Systems 29 (NIPS), pages 1109-111

    Adaptive stochastic Galerkin FEM for lognormal coefficients in hierarchical tensor representations

    Get PDF
    Stochastic Galerkin methods for non-affine coefficient representations are known to cause major difficulties from theoretical and numerical points of view. In this work, an adaptive Galerkin FE method for linear parametric PDEs with lognormal coefficients discretized in Hermite chaos polynomials is derived. It employs problem-adapted function spaces to ensure solvability of the variational formulation. The inherently high computational complexity of the parametric operator is made tractable by using hierarchical tensor representations. For this, a new tensor train format of the lognormal coefficient is derived and verified numerically. The central novelty is the derivation of a reliable residual-based a posteriori error estimator. This can be regarded as a unique feature of stochastic Galerkin methods. It allows for an adaptive algorithm to steer the refinements of the physical mesh and the anisotropic Wiener chaos polynomial degrees. For the evaluation of the error estimator to become feasible, a numerically efficient tensor format discretization is developed. Benchmark examples with unbounded lognormal coefficient fields illustrate the performance of the proposed Galerkin discretization and the fully adaptive algorithm
    corecore