12,257 research outputs found

    Learning to reach and reaching to learn: a unified approach to path planning and reactive control through reinforcement learning

    Get PDF
    The next generation of intelligent robots will need to be able to plan reaches. Not just ballistic point to point reaches, but reaches around things such as the edge of a table, a nearby human, or any other known object in the robot’s workspace. Planning reaches may seem easy to us humans, because we do it so intuitively, but it has proven to be a challenging problem, which continues to limit the versatility of what robots can do today. In this document, I propose a novel intrinsically motivated RL system that draws on both Path/Motion Planning and Reactive Control. Through Reinforcement Learning, it tightly integrates these two previously disparate approaches to robotics. The RL system is evaluated on a task, which is as yet unsolved by roboticists in practice. That is to put the palm of the iCub humanoid robot on arbitrary target objects in its workspace, start- ing from arbitrary initial configurations. Such motions can be generated by planning, or searching the configuration space, but this typically results in some kind of trajectory, which must then be tracked by a separate controller, and such an approach offers a brit- tle runtime solution because it is inflexible. Purely reactive systems are robust to many problems that render a planned trajectory infeasible, but lacking the capacity to search, they tend to get stuck behind constraints, and therefore do not replace motion planners. The planner/controller proposed here is novel in that it deliberately plans reaches without the need to track trajectories. Instead, reaches are composed of sequences of reactive motion primitives, implemented by my Modular Behavioral Environment (MoBeE), which provides (fictitious) force control with reactive collision avoidance by way of a realtime kinematic/geometric model of the robot and its workspace. Thus, to the best of my knowledge, mine is the first reach planning approach to simultaneously offer the best of both the Path/Motion Planning and Reactive Control approaches. By controlling the real, physical robot directly, and feeling the influence of the con- straints imposed by MoBeE, the proposed system learns a stochastic model of the iCub’s configuration space. Then, the model is exploited as a multiple query path planner to find sensible pre-reach poses, from which to initiate reaching actions. Experiments show that the system can autonomously find practical reaches to target objects in workspace and offers excellent robustness to changes in the workspace configuration as well as noise in the robot’s sensory-motor apparatus

    CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control

    Full text link
    Intrinsic motivation is a promising exploration technique for solving reinforcement learning tasks with sparse or absent extrinsic rewards. There exist two technical challenges in implementing intrinsic motivation: 1) how to design a proper intrinsic objective to facilitate efficient exploration; and 2) how to combine the intrinsic objective with the extrinsic objective to help find better solutions. In the current literature, the intrinsic objectives are all designed in a task-agnostic manner and combined with the extrinsic objective via simple addition (or used by itself for reward-free pre-training). In this work, we show that these designs would fail in typical sparse-reward continuous control tasks. To address the problem, we propose Constrained Intrinsic Motivation (CIM) to leverage readily attainable task priors to construct a constrained intrinsic objective, and at the same time, exploit the Lagrangian method to adaptively balance the intrinsic and extrinsic objectives via a simultaneous-maximization framework. We empirically show, on multiple sparse-reward continuous control tasks, that our CIM approach achieves greatly improved performance and sample efficiency over state-of-the-art methods. Moreover, the key techniques of our CIM can also be plugged into existing methods to boost their performances

    A Framework for Reinforcement Learning and Planning

    Full text link
    Sequential decision making, commonly formalized as Markov Decision Process optimization, is a key challenge in artificial intelligence. Two successful approaches to MDP optimization are planning and reinforcement learning. Both research fields largely have their own research communities. However, if both research fields solve the same problem, then we should be able to disentangle the common factors in their solution approaches. Therefore, this paper presents a unifying framework for reinforcement learning and planning (FRAP), which identifies the underlying dimensions on which any planning or learning algorithm has to decide. At the end of the paper, we compare - in a single table - a variety of well-known planning, model-free and model-based RL algorithms along the dimensions of our framework, illustrating the validity of the framework. Altogether, FRAP provides deeper insight into the algorithmic space of planning and reinforcement learning, and also suggests new approaches to integration of both fields

    Softmax exploration strategies for multiobjective reinforcement learning

    Get PDF
    Despite growing interest over recent years in applying reinforcement learning to multiobjective problems, there has been little research into the applicability and effectiveness of exploration strategies within the multiobjective context. This work considers several widely-used approaches to exploration from the single-objective reinforcement learning literature, and examines their incorporation into multiobjective Q-learning. In particular this paper proposes two novel approaches which extend the softmax operator to work with vector-valued rewards. The performance of these exploration strategies is evaluated across a set of benchmark environments. Issues arising from the multiobjective formulation of these benchmarks which impact on the performance of the exploration strategies are identified. It is shown that of the techniques considered, the combination of the novel softmax–epsilon exploration with optimistic initialisation provides the most effective trade-off between exploration and exploitation
    • …
    corecore