Search CORE

2,053 research outputs found

"Guess what I'm doing": Extending legibility to sequential decision tasks

Author: Faria Miguel
Melo Francisco S.
Paiva Ana
Publication venue
Publication date: 19/09/2022
Field of study

In this paper we investigate the notion of legibility in sequential decision tasks under uncertainty. Previous works that extend legibility to scenarios beyond robot motion either focus on deterministic settings or are computationally too expensive. Our proposed approach, dubbed PoL-MDP, is able to handle uncertainty while remaining computationally tractable. We establish the advantages of our approach against state-of-the-art approaches in several simulated scenarios of different complexity. We also showcase the use of our legible policies as demonstrations for an inverse reinforcement learning agent, establishing their superiority against the commonly used demonstrations based on the optimal policy. Finally, we assess the legibility of our computed policies through a user study where people are asked to infer the goal of a mobile robot following a legible policy by observing its actions

arXiv.org e-Print Archive

Experimental evidence of shock mitigation in a Hertzian tapered chain

Author: Francisco Melo
Francisco Santibanez
Franco Tapia
L. D. Landau
M. Nakagawa
S. L. Gavrilyuk
S. Sen
Stéphane Job
V. F. Nesterenko
V. F. Nesterenko
Publication venue: 'American Physical Society (APS)'
Publication date: 12/02/2006
Field of study

We present an experimental study of the mechanical impulse propagation through a horizontal alignment of elastic spheres of progressively decreasing diameter

\phi_n

, namely a tapered chain. Experimentally, the diameters of spheres which interact via the Hertz potential are selected to keep as close as possible to an exponential decrease,

\phi_{n+1}=(1-q)\phi_n

, where the experimental tapering factor is either

q_1\simeq5.60

~% or

q_2\simeq8.27

~%. In agreement with recent numerical results, an impulse initiated in a monodisperse chain (a chain of identical beads) propagates without shape changes, and progressively transfer its energy and momentum to a propagating tail when it further travels in a tapered chain. As a result, the front pulse of this wave decreases in amplitude and accelerates. Both effects are satisfactorily described by the hard spheres approximation, and basically, the shock mitigation is due to partial transmissions, from one bead to the next, of momentum and energy of the front pulse. In addition when small dissipation is included, a better agreement with experiments is found. A close analysis of the loading part of the experimental pulses demonstrates that the front wave adopts itself a self similar solution as it propagates in the tapered chain. Finally, our results corroborate the capability of these chains to thermalize propagating impulses and thereby act as shock absorbing devices.Comment: ReVTeX, 7 pages with 6 eps, accepted for Phys. Rev. E (Related papers on http://www.supmeca.fr/perso/jobs/

arXiv.org e-Print Archive

Crossref

Interactively Teaching an Inverse Reinforcement Learner with Limited Feedback

Author: Lopes Manuel
Melo Francisco S.
Zayanov Rustam
Publication venue
Publication date: 16/09/2023
Field of study

We study the problem of teaching via demonstrations in sequential decision-making tasks. In particular, we focus on the situation when the teacher has no access to the learner's model and policy, and the feedback from the learner is limited to trajectories that start from states selected by the teacher. The necessity to select the starting states and infer the learner's policy creates an opportunity for using the methods of inverse reinforcement learning and active learning by the teacher. In this work, we formalize the teaching process with limited feedback and propose an algorithm that solves this teaching problem. The algorithm uses a modified version of the active value-at-risk method to select the starting states, a modified maximum causal entropy algorithm to infer the policy, and the difficulty score ratio method to choose the teaching demonstrations. We test the algorithm in a synthetic car driving environment and conclude that the proposed algorithm is an effective solution when the learner's feedback is limited.Comment: 7 pages, 3 figure

arXiv.org e-Print Archive

An analysis of reinforcement learning with function approximation

Author: Francisco S. Melo
M. Isabel Ribeiro
Publication venue
Publication date
Field of study

Abstract — In this paper, we propose a reinforcement learning approach to address multi-robot cooperative navigation tasks in infinite settings. We propose an algorithm to simultaneously address the problems of learning and coordination in multirobot problems. The proposed algorithm extends those existing in the literature, allowing to address simultaneous learning and coordination in problems with an infinite state-space. We also present the results obtained in several test scenarios featuring multi-robot navigation situations with partial observability. I

CiteSeerX

Hexagons, Kinks and Disorder in Oscillated Granular Layers

Author: A. Mehta
B. Thomas
C. Laroche
F. Melo
Francisco Melo
H.K. Pak
H.K. Pak
H.W. Müller
Harry L. Swinney
J.B. Knight
P. Evesque
Paul B. Umbanhowar
S. Douady
S. Fauve
W.S. Edwards
Publication venue: 'American Physical Society (APS)'
Publication date: 17/07/1995
Field of study

Experiments on vertically oscillated granular layers in an evacuated container reveal a sequence of well-defined pattern bifurcations as the container acceleration is increased. Period doublings of the layer center of mass motion and a parametric wave instability interact to produce hexagons and more complicated patterns composed of distinct spatial domains of different relative phase separated by kinks (phase discontinuities). Above a critical acceleration, the layer becomes disordered in both space and time.Comment: 4 pages. The RevTeX file has a macro allowing various styles. The appropriate style is "myprint" which is the defaul

arXiv.org e-Print Archive

Crossref