42,730 research outputs found

    Learning a Unified Control Policy for Safe Falling

    Full text link
    Being able to fall safely is a necessary motor skill for humanoids performing highly dynamic tasks, such as running and jumping. We propose a new method to learn a policy that minimizes the maximal impulse during the fall. The optimization solves for both a discrete contact planning problem and a continuous optimal control problem. Once trained, the policy can compute the optimal next contacting body part (e.g. left foot, right foot, or hands), contact location and timing, and the required joint actuation. We represent the policy as a mixture of actor-critic neural network, which consists of n control policies and the corresponding value functions. Each pair of actor-critic is associated with one of the n possible contacting body parts. During execution, the policy corresponding to the highest value function will be executed while the associated body part will be the next contact with the ground. With this mixture of actor-critic architecture, the discrete contact sequence planning is solved through the selection of the best critics while the continuous control problem is solved by the optimization of actors. We show that our policy can achieve comparable, sometimes even higher, rewards than a recursive search of the action space using dynamic programming, while enjoying 50 to 400 times of speed gain during online execution

    Addressing Function Approximation Error in Actor-Critic Methods

    Get PDF
    In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.Comment: Accepted at ICML 201

    Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

    Full text link
    Value-based reinforcement-learning algorithms provide state-of-the-art results in model-free discrete-action settings, and tend to outperform actor-critic algorithms. We argue that actor-critic algorithms are limited by their need for an on-policy critic. We propose Bootstrapped Dual Policy Iteration (BDPI), a novel model-free reinforcement-learning algorithm for continuous states and discrete actions, with an actor and several off-policy critics. Off-policy critics are compatible with experience replay, ensuring high sample-efficiency, without the need for off-policy corrections. The actor, by slowly imitating the average greedy policy of the critics, leads to high-quality and state-specific exploration, which we compare to Thompson sampling. Because the actor and critics are fully decoupled, BDPI is remarkably stable, and unusually robust to its hyper-parameters. BDPI is significantly more sample-efficient than Bootstrapped DQN, PPO, and ACKTR, on discrete, continuous and pixel-based tasks. Source code: https://github.com/vub-ai-lab/bdpi.Comment: Accepted at the European Conference on Machine Learning 2019 (ECML

    Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments

    Full text link
    In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each of the eight teams implemented different modifications of the known algorithms.Comment: 27 pages, 17 figure

    The evolution and policy implications of Phillips curve analysis

    Get PDF
    The policy implications of the Phillips curve relationship between inflation and unemployment have changed dramatically in the twenty-seven years since A.W. Phillips first identified a negative correlation between money wage changes and joblessness in Great Britain. Originally, Phillips’ own findings suggested that policymakers could move the economy along his curve, trading off higher inflation for lower unemployment until the best (or least undesirable) attainable combination of both had been reached. Today, such a view is widely discredited. The statistical relation between inflation and unemployment has broken down and the Phillips curve is now generally viewed as offering no trade-off at all. This radical change in the policy implications of the Phillips curve did not occur all at once; rather it was the cumulative result of a series of theoretical innovations, which Thomas M. Humphrey chronicles in “The Evolution and Policy Implications of Phillips Curve Analysis.” The two most important innovations were the natural rate hypothesis, which implies that unemployment can be reduced below its normal rate only by fooling the public with surprise inflation, and the rational expectations hypothesis, which implies that the public cannot be systematically fooled. Together, these two hypotheses imply that no systematic macroeconomic policy can affect unemployment. Even though no inflation-unemployment trade-off exists for policymakers to exploit, as Humphrey points out, policymakers can still contribute to reducing the variability and average level of unemployment by avoiding erratic policy changes and by enacting measures to improve the efficiency and performance of labor and product markets.Phillips curve

    Neo-Aristotelian Naturalism and the Evolutionary Objection: Rethinking the Relevance of Empirical Science

    Get PDF
    Neo-Aristotelian metaethical naturalism is a modern attempt at naturalizing ethics using ideas from Aristotle’s teleological metaphysics. Proponents of this view argue that moral virtue in human beings is an instance of natural goodness, a kind of goodness supposedly also found in the realm of non-human living things. Many critics question whether neo-Aristotelian naturalism is tenable in light of modern evolutionary biology. Two influential lines of objection have appealed to an evolutionary understanding of human nature and natural teleology to argue against this view. In this paper, I offer a reconstruction of these two seemingly different lines of objection as raising instances of the same dilemma, giving neo-Aristotelians a choice between contradicting our considered moral judgment and abandoning metaethical naturalism. I argue that resolving the dilemma requires showing a particular kind of continuity between the norms of moral virtue and norms that are necessary for understanding non-human living things. I also argue that in order to show such a continuity, neo-Aristotelians need to revise the relationship they adopt with empirical science and acknowledge that the latter is relevant to assessing their central commitments regarding living things. Finally, I argue that to move this debate forward, both neo-Aristotelians and their critics should pay attention to recent work on the concept of organism in evolutionary and developmental biology
    • 

    corecore