2,046 research outputs found

    Forward and bidirectional planning based on reinforcement learning and neural networks in a simulated robot.

    Get PDF
    Building intelligent systems that are capable of learning, acting reactively and planning actions before their execution is a major goal of artificial intelligence. This paper presents two reactive and planning systems that contain important novelties with respect to previous neural-network planners and reinforcement- learning based planners: (a) the introduction of a new component (?matcher?) allows both planners to execute genuine taskable planning (while previous reinforcement-learning based models have used planning only for speeding up learning); (b) the planners show for the first time that trained neural- network models of the world can generate long prediction chains that have an interesting robustness with regards to noise; (c) two novel algorithms that generate chains of predictions in order to plan, and control the flows of information between the systems? different neural components, are presented; (d) one of the planners uses backward ?predictions? to exploit the knowledge of the pursued goal; (e) the two systems presented nicely integrate reactive behavior and planning on the basis of a measure of ?confidence? in action. The soundness and potentialities of the two reactive and planning systems are tested and compared with a simulated robot engaged in a stochastic path-finding task. The paper also presents an extensive literature review on the relevant issues

    Planning with neural networks and reinforcement learning

    Get PDF
    This thesis presents the design, implementation and investigation of some predictive-planning controllers built with neural-networks and inspired by Dyna-PI architectures (Sutton, 1990). Dyna-PI architectures are planning systems based on actor-critic reinforcement learning methods and a model of the environment. The controllers are tested with a simulated robot that solves a stochastic path-finding landmark navigation task. A critical review of ideas and models proposed by the literature on problem solving, planning, reinforcement learning, and neural networks precedes the presentation of the controllers. The review isolates ideas relevant to the design of planners based on neural networks. A "neural forward planner" is implemented that, unlike the Dyna-PI architectures, is taskable in a strong sense. This planner is capable of building a "partial policy" focussed on around efficient start-goal paths, and is capable of deciding to re-plan if "unexpected" states are encountered. Planning iteratively generates "chains of predictions" starting from the current state and using the model of the environment. This model is made up by some neural networks trained to predict the next input when an action is executed. A "neural bidirectional planner" that generates trajectories backward from the goal and forward from the current state is also implemented. This planner exploits the knowledge (image) on the goal, further focuses planning around efficient start-goal paths, and produces a quicker updating of evaluations. In several experiments the generalisation capacity of neural networks proves important for learning but it also causes problems of interference. To deal with these problems a modular neural architecture is implemented, that uses a mixture of experts network for the critic, and a simple hierarchical modular network for the actor. The research also implements a simple form of neural abstract planning named "coarse planning", and investigates its strengths in terms of exploration and evaluations\u27 updating. Some experiments with coarse planning and with other controllers suggest that discounted reinforcement learning may have problems dealing with long-lasting tasks

    A planning modular neural-network robot for asynchronous multi-goal navigation tasks

    Get PDF
    This paper focuses on two planning neural-network controllers, a "forward planner" and a "bidirectional planner". These have been developed within the framework of Sutton\u27s Dyna-PI architectures (planning within reinforcement learning) and have already been presented in previous papers. The novelty of this paper is that the architecture of these planners is made modular in some of its components in order to deal with catastrophic interference. The controllers are tested through a simulated robot engaged in an asynchronous multi-goal path-planning problem that should exacerbate the interference problems. The results show that: (a) the modular planners can cope with multi-goal problems allowing generalisation but avoiding interference; (b) when dealing with multi-goal problems the planners keeps the advantages shown previously for one-goal problems vs. sheer reinforcement learning; (c) the superiority of the bidirectional planner vs. the forward planner is confirmed for the multi-goal task
    • …
    corecore