3,531 research outputs found
CasIL: Cognizing and Imitating Skills via a Dual Cognition-Action Architecture
Enabling robots to effectively imitate expert skills in longhorizon tasks
such as locomotion, manipulation, and more, poses a long-standing challenge.
Existing imitation learning (IL) approaches for robots still grapple with
sub-optimal performance in complex tasks. In this paper, we consider how this
challenge can be addressed within the human cognitive priors. Heuristically, we
extend the usual notion of action to a dual Cognition (high-level)-Action
(low-level) architecture by introducing intuitive human cognitive priors, and
propose a novel skill IL framework through human-robot interaction, called
Cognition-Action-based Skill Imitation Learning (CasIL), for the robotic agent
to effectively cognize and imitate the critical skills from raw visual
demonstrations. CasIL enables both cognition and action imitation, while
high-level skill cognition explicitly guides low-level primitive actions,
providing robustness and reliability to the entire skill IL process. We
evaluated our method on MuJoCo and RLBench benchmarks, as well as on the
obstacle avoidance and point-goal navigation tasks for quadrupedal robot
locomotion. Experimental results show that our CasIL consistently achieves
competitive and robust skill imitation capability compared to other
counterparts in a variety of long-horizon robotic tasks
Multi-Modal Human-Machine Communication for Instructing Robot Grasping Tasks
A major challenge for the realization of intelligent robots is to supply them
with cognitive abilities in order to allow ordinary users to program them
easily and intuitively. One way of such programming is teaching work tasks by
interactive demonstration. To make this effective and convenient for the user,
the machine must be capable to establish a common focus of attention and be
able to use and integrate spoken instructions, visual perceptions, and
non-verbal clues like gestural commands. We report progress in building a
hybrid architecture that combines statistical methods, neural networks, and
finite state machines into an integrated system for instructing grasping tasks
by man-machine interaction. The system combines the GRAVIS-robot for visual
attention and gestural instruction with an intelligent interface for speech
recognition and linguistic interpretation, and an modality fusion module to
allow multi-modal task-oriented man-machine communication with respect to
dextrous robot manipulation of objects.Comment: 7 pages, 8 figure
- …