2,053 research outputs found

    "Guess what I'm doing": Extending legibility to sequential decision tasks

    Full text link
    In this paper we investigate the notion of legibility in sequential decision tasks under uncertainty. Previous works that extend legibility to scenarios beyond robot motion either focus on deterministic settings or are computationally too expensive. Our proposed approach, dubbed PoL-MDP, is able to handle uncertainty while remaining computationally tractable. We establish the advantages of our approach against state-of-the-art approaches in several simulated scenarios of different complexity. We also showcase the use of our legible policies as demonstrations for an inverse reinforcement learning agent, establishing their superiority against the commonly used demonstrations based on the optimal policy. Finally, we assess the legibility of our computed policies through a user study where people are asked to infer the goal of a mobile robot following a legible policy by observing its actions

    Experimental evidence of shock mitigation in a Hertzian tapered chain

    Full text link
    We present an experimental study of the mechanical impulse propagation through a horizontal alignment of elastic spheres of progressively decreasing diameter ϕn\phi_n, namely a tapered chain. Experimentally, the diameters of spheres which interact via the Hertz potential are selected to keep as close as possible to an exponential decrease, ϕn+1=(1−q)ϕn\phi_{n+1}=(1-q)\phi_n, where the experimental tapering factor is either q1≃5.60q_1\simeq5.60~% or q2≃8.27q_2\simeq8.27~%. In agreement with recent numerical results, an impulse initiated in a monodisperse chain (a chain of identical beads) propagates without shape changes, and progressively transfer its energy and momentum to a propagating tail when it further travels in a tapered chain. As a result, the front pulse of this wave decreases in amplitude and accelerates. Both effects are satisfactorily described by the hard spheres approximation, and basically, the shock mitigation is due to partial transmissions, from one bead to the next, of momentum and energy of the front pulse. In addition when small dissipation is included, a better agreement with experiments is found. A close analysis of the loading part of the experimental pulses demonstrates that the front wave adopts itself a self similar solution as it propagates in the tapered chain. Finally, our results corroborate the capability of these chains to thermalize propagating impulses and thereby act as shock absorbing devices.Comment: ReVTeX, 7 pages with 6 eps, accepted for Phys. Rev. E (Related papers on http://www.supmeca.fr/perso/jobs/

    Interactively Teaching an Inverse Reinforcement Learner with Limited Feedback

    Full text link
    We study the problem of teaching via demonstrations in sequential decision-making tasks. In particular, we focus on the situation when the teacher has no access to the learner's model and policy, and the feedback from the learner is limited to trajectories that start from states selected by the teacher. The necessity to select the starting states and infer the learner's policy creates an opportunity for using the methods of inverse reinforcement learning and active learning by the teacher. In this work, we formalize the teaching process with limited feedback and propose an algorithm that solves this teaching problem. The algorithm uses a modified version of the active value-at-risk method to select the starting states, a modified maximum causal entropy algorithm to infer the policy, and the difficulty score ratio method to choose the teaching demonstrations. We test the algorithm in a synthetic car driving environment and conclude that the proposed algorithm is an effective solution when the learner's feedback is limited.Comment: 7 pages, 3 figure

    An analysis of reinforcement learning with function approximation

    Get PDF
    Abstract — In this paper, we propose a reinforcement learning approach to address multi-robot cooperative navigation tasks in infinite settings. We propose an algorithm to simultaneously address the problems of learning and coordination in multirobot problems. The proposed algorithm extends those existing in the literature, allowing to address simultaneous learning and coordination in problems with an infinite state-space. We also present the results obtained in several test scenarios featuring multi-robot navigation situations with partial observability. I

    Hexagons, Kinks and Disorder in Oscillated Granular Layers

    Full text link
    Experiments on vertically oscillated granular layers in an evacuated container reveal a sequence of well-defined pattern bifurcations as the container acceleration is increased. Period doublings of the layer center of mass motion and a parametric wave instability interact to produce hexagons and more complicated patterns composed of distinct spatial domains of different relative phase separated by kinks (phase discontinuities). Above a critical acceleration, the layer becomes disordered in both space and time.Comment: 4 pages. The RevTeX file has a macro allowing various styles. The appropriate style is "myprint" which is the defaul
    • …
    corecore