99 research outputs found
The Assistive Multi-Armed Bandit
Learning preferences implicit in the choices humans make is a well studied
problem in both economics and computer science. However, most work makes the
assumption that humans are acting (noisily) optimally with respect to their
preferences. Such approaches can fail when people are themselves learning about
what they want. In this work, we introduce the assistive multi-armed bandit,
where a robot assists a human playing a bandit task to maximize cumulative
reward. In this problem, the human does not know the reward function but can
learn it through the rewards received from arm pulls; the robot only observes
which arms the human pulls but not the reward associated with each pull. We
offer sufficient and necessary conditions for successfully assisting the human
in this framework. Surprisingly, better human performance in isolation does not
necessarily lead to better performance when assisted by the robot: a human
policy can do better by effectively communicating its observed rewards to the
robot. We conduct proof-of-concept experiments that support these results. We
see this work as contributing towards a theory behind algorithms for
human-robot interaction.Comment: Accepted to HRI 201
Adaptive modality selection algorithm in robot-assisted cognitive training
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Interaction of socially assistive robots with users is based on social cues coming from different interaction modalities, such as speech or gestures. However, using all modalities at all times may be inefficient as it can overload the user with redundant information and increase the task completion time. Additionally, users may favor certain modalities over the other as a result of their disability or personal preference. In this paper, we propose an Adaptive Modality Selection (AMS) algorithm that chooses modalities depending on the state of the user and the environment, as well as user preferences. The variables that describe the environment and the user state are defined as resources, and we posit that modalities are successful if certain resources possess specific values during their use. Besides the resources, the proposed algorithm takes into account user preferences which it learns while interacting with users. We tested our algorithm in simulations, and we implemented it on a robotic system that provides cognitive training, specifically Sequential memory exercises. Experimental results show that it is possible to use only a subset of available modalities without compromising the interaction. Moreover, we see a trend for users to perform better when interacting with a system with implemented AMS algorithm.Peer ReviewedPostprint (author's final draft
Learning robot policies using a high-level abstraction persona-behaviour simulator
2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksCollecting data in Human-Robot Interaction for training learning agents might be a hard task to accomplish. This is especially true when the target users are older adults with dementia since this usually requires hours of interactions and puts quite a lot of workload on the user. This paper addresses the problem of importing the Personas technique from HRI to create fictional patients’ profiles. We propose a Persona-Behaviour Simulator tool that provides, with high-level abstraction, user’s actions during an HRI task, and we apply it to cognitive training exercises for older adults with dementia. It consists of a Persona Definition that characterizes a patient along four dimensions and a Task Engine that provides information regarding the task complexity. We build a simulated environment where the high-level user’s actions are provided by the simulator and the robot initial policy is learned using a Q-learning algorithm. The results show that the current simulator provides a reasonable initial policy for a defined Persona profile. Moreover, the learned robot assistance has proved to be robust to potential changes in the user’s behaviour. In this way, we can speed up the fine-tuning of the rough policy during the real interactions to tailor the assistance to the given user. We believe the presented approach can be easily extended to account for other types of HRI tasks; for example, when input data is required to train a learning algorithm, but data collection is very expensive or unfeasible. We advocate that simulation is a convenient tool in these cases.Peer ReviewedPostprint (author's final draft
- …