918 research outputs found

    ES Is More Than Just a Traditional Finite-Difference Approximator

    Full text link
    An evolution strategy (ES) variant based on a simplification of a natural evolution strategy recently attracted attention because it performs surprisingly well in challenging deep reinforcement learning domains. It searches for neural network parameters by generating perturbations to the current set of parameters, checking their performance, and moving in the aggregate direction of higher reward. Because it resembles a traditional finite-difference approximation of the reward gradient, it can naturally be confused with one. However, this ES optimizes for a different gradient than just reward: It optimizes for the average reward of the entire population, thereby seeking parameters that are robust to perturbation. This difference can channel ES into distinct areas of the search space relative to gradient descent, and also consequently to networks with distinct properties. This unique robustness-seeking property, and its consequences for optimization, are demonstrated in several domains. They include humanoid locomotion, where networks from policy gradient-based reinforcement learning are significantly less robust to parameter perturbation than ES-based policies solving the same task. While the implications of such robustness and robustness-seeking remain open to further study, this work's main contribution is to highlight such differences and their potential importance

    Reinforcement learning in continuous state and action spaces

    Get PDF
    Many traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action spaces, which can make learning a good decision policy even more involved. In this chapter we discuss how to automatically find good decision policies in continuous domains. Because analytically computing a good policy from a continuous model can be infeasible, in this chapter we mainly focus on methods that explicitly update a representation of a value function, a policy or both. We discuss considerations in choosing an appropriate representation for these functions and discuss gradient-based and gradient-free ways to update the parameters. We show how to apply these methods to reinforcement-learning problems and discuss many specific algorithms. Amongst others, we cover gradient-based temporal-difference learning, evolutionary strategies, policy-gradient algorithms and actor-critic methods. We discuss the advantages of different approaches and compare the performance of a state-of-the-art actor-critic method and a state-of-the-art evolutionary strategy empirically

    Some models are useful, but how do we know which ones? Towards a unified Bayesian model taxonomy

    Full text link
    Probabilistic (Bayesian) modeling has experienced a surge of applications in almost all quantitative sciences and industrial areas. This development is driven by a combination of several factors, including better probabilistic estimation algorithms, flexible software, increased computing power, and a growing awareness of the benefits of probabilistic learning. However, a principled Bayesian model building workflow is far from complete and many challenges remain. To aid future research and applications of a principled Bayesian workflow, we ask and provide answers for what we perceive as two fundamental questions of Bayesian modeling, namely (a) "What actually is a Bayesian model?" and (b) "What makes a good Bayesian model?". As an answer to the first question, we propose the PAD model taxonomy that defines four basic kinds of Bayesian models, each representing some combination of the assumed joint distribution of all (known or unknown) variables (P), a posterior approximator (A), and training data (D). As an answer to the second question, we propose ten utility dimensions according to which we can evaluate Bayesian models holistically, namely, (1) causal consistency, (2) parameter recoverability, (3) predictive performance, (4) fairness, (5) structural faithfulness, (6) parsimony, (7) interpretability, (8) convergence, (9) estimation speed, and (10) robustness. Further, we propose two example utility decision trees that describe hierarchies and trade-offs between utilities depending on the inferential goals that drive model building and testing

    Fully Unintegrated Parton Correlation Functions and Factorization in Lowest Order Hard Scattering

    Full text link
    Motivated by the need to correct the potentially large kinematic errors in approximations used in the standard formulation of perturbative QCD, we reformulate deeply inelastic lepton-proton scattering in terms of gauge invariant, universal parton correlation functions which depend on all components of parton four-momentum. Currently, different hard QCD processes are described by very different perturbative formalisms, each relying on its own set of kinematical approximations. In this paper we show how to set up formalism that avoids approximations on final-state momenta, and thus has a very general domain of applicability. The use of exact kinematics introduces a number of significant conceptual shifts already at leading order, and tightly constrains the formalism. We show how to define parton correlation functions that generalize the concepts of parton density, fragmentation function, and soft factor. After setting up a general subtraction formalism, we obtain a factorization theorem. To avoid complications with Ward identities the full derivation is restricted to abelian gauge theories; even so the resulting structure is highly suggestive of a similar treatment for non-abelian gauge theories.Comment: 44 pages, 69 figures typos fixed, clarifications and second appendix adde
    • …
    corecore