42,730 research outputs found
Learning a Unified Control Policy for Safe Falling
Being able to fall safely is a necessary motor skill for humanoids performing
highly dynamic tasks, such as running and jumping. We propose a new method to
learn a policy that minimizes the maximal impulse during the fall. The
optimization solves for both a discrete contact planning problem and a
continuous optimal control problem. Once trained, the policy can compute the
optimal next contacting body part (e.g. left foot, right foot, or hands),
contact location and timing, and the required joint actuation. We represent the
policy as a mixture of actor-critic neural network, which consists of n control
policies and the corresponding value functions. Each pair of actor-critic is
associated with one of the n possible contacting body parts. During execution,
the policy corresponding to the highest value function will be executed while
the associated body part will be the next contact with the ground. With this
mixture of actor-critic architecture, the discrete contact sequence planning is
solved through the selection of the best critics while the continuous control
problem is solved by the optimization of actors. We show that our policy can
achieve comparable, sometimes even higher, rewards than a recursive search of
the action space using dynamic programming, while enjoying 50 to 400 times of
speed gain during online execution
Addressing Function Approximation Error in Actor-Critic Methods
In value-based reinforcement learning methods such as deep Q-learning,
function approximation errors are known to lead to overestimated value
estimates and suboptimal policies. We show that this problem persists in an
actor-critic setting and propose novel mechanisms to minimize its effects on
both the actor and the critic. Our algorithm builds on Double Q-learning, by
taking the minimum value between a pair of critics to limit overestimation. We
draw the connection between target networks and overestimation bias, and
suggest delaying policy updates to reduce per-update error and further improve
performance. We evaluate our method on the suite of OpenAI gym tasks,
outperforming the state of the art in every environment tested.Comment: Accepted at ICML 201
Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics
Value-based reinforcement-learning algorithms provide state-of-the-art
results in model-free discrete-action settings, and tend to outperform
actor-critic algorithms. We argue that actor-critic algorithms are limited by
their need for an on-policy critic. We propose Bootstrapped Dual Policy
Iteration (BDPI), a novel model-free reinforcement-learning algorithm for
continuous states and discrete actions, with an actor and several off-policy
critics. Off-policy critics are compatible with experience replay, ensuring
high sample-efficiency, without the need for off-policy corrections. The actor,
by slowly imitating the average greedy policy of the critics, leads to
high-quality and state-specific exploration, which we compare to Thompson
sampling. Because the actor and critics are fully decoupled, BDPI is remarkably
stable, and unusually robust to its hyper-parameters. BDPI is significantly
more sample-efficient than Bootstrapped DQN, PPO, and ACKTR, on discrete,
continuous and pixel-based tasks. Source code:
https://github.com/vub-ai-lab/bdpi.Comment: Accepted at the European Conference on Machine Learning 2019 (ECML
Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments
In the NIPS 2017 Learning to Run challenge, participants were tasked with
building a controller for a musculoskeletal model to make it run as fast as
possible through an obstacle course. Top participants were invited to describe
their algorithms. In this work, we present eight solutions that used deep
reinforcement learning approaches, based on algorithms such as Deep
Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region
Policy Optimization. Many solutions use similar relaxations and heuristics,
such as reward shaping, frame skipping, discretization of the action space,
symmetry, and policy blending. However, each of the eight teams implemented
different modifications of the known algorithms.Comment: 27 pages, 17 figure
The evolution and policy implications of Phillips curve analysis
The policy implications of the Phillips curve relationship between inflation and unemployment have changed dramatically in the twenty-seven years since A.W. Phillips first identified a negative correlation between money wage changes and joblessness in Great Britain. Originally, Phillipsâ own findings suggested that policymakers could move the economy along his curve, trading off higher inflation for lower unemployment until the best (or least undesirable) attainable combination of both had been reached. Today, such a view is widely discredited. The statistical relation between inflation and unemployment has broken down and the Phillips curve is now generally viewed as offering no trade-off at all. This radical change in the policy implications of the Phillips curve did not occur all at once; rather it was the cumulative result of a series of theoretical innovations, which Thomas M. Humphrey chronicles in âThe Evolution and Policy Implications of Phillips Curve Analysis.â The two most important innovations were the natural rate hypothesis, which implies that unemployment can be reduced below its normal rate only by fooling the public with surprise inflation, and the rational expectations hypothesis, which implies that the public cannot be systematically fooled. Together, these two hypotheses imply that no systematic macroeconomic policy can affect unemployment. Even though no inflation-unemployment trade-off exists for policymakers to exploit, as Humphrey points out, policymakers can still contribute to reducing the variability and average level of unemployment by avoiding erratic policy changes and by enacting measures to improve the efficiency and performance of labor and product markets.Phillips curve
Neo-Aristotelian Naturalism and the Evolutionary Objection: Rethinking the Relevance of Empirical Science
Neo-Aristotelian metaethical naturalism is a modern attempt at naturalizing ethics using ideas from Aristotleâs teleological metaphysics. Proponents of this view argue that moral virtue in human beings is an instance of natural goodness, a kind of goodness supposedly also found in the realm of non-human living things. Many critics question whether neo-Aristotelian naturalism is tenable in light of modern evolutionary biology. Two influential lines of objection have appealed to an evolutionary understanding of human nature and natural teleology to argue against this view. In this paper, I offer a reconstruction of these two seemingly different lines of objection as raising instances of the same dilemma, giving neo-Aristotelians a choice between contradicting our considered moral judgment and abandoning metaethical naturalism. I argue that resolving the dilemma requires showing a particular kind of continuity between the norms of moral virtue and norms that are necessary for understanding non-human living things. I also argue that in order to show such a continuity, neo-Aristotelians need to revise the relationship they adopt with empirical science and acknowledge that the latter is relevant to assessing their central commitments regarding living things. Finally, I argue that to move this debate forward, both neo-Aristotelians and their critics should pay attention to recent work on the concept of organism in evolutionary and developmental biology
- âŠ