168 research outputs found
Preference-Based Learning for Exoskeleton Gait Optimization
This paper presents a personalized gait optimization framework for lower-body exoskeletons. Rather than optimizing numerical objectives such as the mechanical cost of transport, our approach directly learns from user preferences, e.g., for comfort. Building upon work in preference-based interactive learning, we present the CoSpar algorithm. CoSpar prompts the user to give pairwise preferences between trials and suggest improvements; as exoskeleton walking is a non-intuitive behavior, users can provide preferences more easily and reliably than numerical feedback. We show that CoSpar performs competitively in simulation and demonstrate a prototype implementation of CoSpar on a lower-body exoskeleton to optimize human walking trajectory features. In the experiments, CoSpar consistently found user-preferred parameters of the exoskeleton’s walking gait, which suggests that it is a promising starting point for adapting and personalizing exoskeletons (or other assistive devices) to individual users
Coactive Learning for Locally Optimal Problem Solving
Coactive learning is an online problem solving setting where the solutions
provided by a solver are interactively improved by a domain expert, which in
turn drives learning. In this paper we extend the study of coactive learning to
problems where obtaining a globally optimal or near-optimal solution may be
intractable or where an expert can only be expected to make small, local
improvements to a candidate solution. The goal of learning in this new setting
is to minimize the cost as measured by the expert effort over time. We first
establish theoretical bounds on the average cost of the existing coactive
Perceptron algorithm. In addition, we consider new online algorithms that use
cost-sensitive and Passive-Aggressive (PA) updates, showing similar or improved
theoretical bounds. We provide an empirical evaluation of the learners in
various domains, which show that the Perceptron based algorithms are quite
effective and that unlike the case for online classification, the PA algorithms
do not yield significant performance gains.Comment: AAAI 2014 paper, including appendice
Preference-Based Learning for Exoskeleton Gait Optimization
This paper presents a personalized gait optimization framework for lower-body exoskeletons. Rather than optimizing numerical objectives such as the mechanical cost of transport, our approach directly learns from user preferences, e.g., for comfort. Building upon work in preference-based interactive learning, we present the CoSpar algorithm. CoSpar prompts the user to give pairwise preferences between trials and suggest improvements; as exoskeleton walking is a non-intuitive behavior, users can provide preferences more easily and reliably than numerical feedback. We show that CoSpar performs competitively in simulation and demonstrate a prototype implementation of CoSpar on a lower-body exoskeleton to optimize human walking trajectory features. In the experiments, CoSpar consistently found user-preferred parameters of the exoskeleton’s walking gait, which suggests that it is a promising starting point for adapting and personalizing exoskeletons (or other assistive devices) to individual users
Learning from Physical Human Feedback: An Object-Centric One-Shot Adaptation Method
For robots to be effectively deployed in novel environments and tasks, they
must be able to understand the feedback expressed by humans during
intervention. This can either correct undesirable behavior or indicate
additional preferences. Existing methods either require repeated episodes of
interactions or assume prior known reward features, which is data-inefficient
and can hardly transfer to new tasks. We relax these assumptions by describing
human tasks in terms of object-centric sub-tasks and interpreting physical
interventions in relation to specific objects. Our method, Object Preference
Adaptation (OPA), is composed of two key stages: 1) pre-training a base policy
to produce a wide variety of behaviors, and 2) online-updating according to
human feedback. The key to our fast, yet simple adaptation is that general
interaction dynamics between agents and objects are fixed, and only
object-specific preferences are updated. Our adaptation occurs online, requires
only one human intervention (one-shot), and produces new behaviors never seen
during training. Trained on cheap synthetic data instead of expensive human
demonstrations, our policy correctly adapts to human perturbations on realistic
tasks on a physical 7DOF robot. Videos, code, and supplementary material are
provided.Comment: Accepted to ICRA 202
- …