61,504 research outputs found

    Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments

    Full text link
    In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each of the eight teams implemented different modifications of the known algorithms.Comment: 27 pages, 17 figure

    Colour reverse learning and animal personalities: the advantage of behavioural diversity assessed with agent-based simulations

    Get PDF
    Foraging bees use colour cues to help identify rewarding from unrewarding flowers, but as conditions change, bees may require behavioural flexibility to reverse their learnt preferences. Perceptually similar colours are learnt slowly by honeybees and thus potentially pose a difficult task to reverse-learn. Free-flying honeybees (N = 32) were trained to learn a fine colour discrimination task that could be resolved at ca. 70% accuracy following extended differential conditioning, and were then tested for their ability to reverse-learn this visual problem multiple times. Subsequent analyses identified three different strategies: ‘Deliberative-decisive’ bees that could, after several flower visits, decisively make a large change to learnt preferences; ‘Fickle- circumspect’ bees that changed their preferences by a small amount every time they encountered evidence in their environment; and ‘Stay’ bees that did not change from their initially learnt preference. The next aim was to determine if there was any advantage to a colony in maintaining bees with a variety of decision-making strategies. To understand the potential benefits of the observed behavioural diversity agent-based computer simulations were conducted by systematically varying parameters for flower reward switch oscillation frequency, flower handling time, and fraction of defective ‘target’ stimuli. These simulations revealed that when there is a relatively high frequency of reward reversals, fickle-circumspect bees are more efficient at nectar collection. However, as the reward reversal frequency decreases the performance of deliberative-decisive bees becomes most efficient. These findings show there to be an evolutionary benefit for honeybee colonies with individuals exhibiting these different strategies for managing resource change. The strategies have similarities to some complex decision-making processes observed in humans, and algorithms implemented in artificial intelligence systems
    corecore