520 research outputs found
Time-Contrastive Networks: Self-Supervised Learning from Video
We propose a self-supervised approach for learning representations and
robotic behaviors entirely from unlabeled videos recorded from multiple
viewpoints, and study how this representation can be used in two robotic
imitation settings: imitating object interactions from videos of humans, and
imitating human poses. Imitation of human behavior requires a
viewpoint-invariant representation that captures the relationships between
end-effectors (hands or robot grippers) and the environment, object attributes,
and body pose. We train our representations using a metric learning loss, where
multiple simultaneous viewpoints of the same observation are attracted in the
embedding space, while being repelled from temporal neighbors which are often
visually similar but functionally different. In other words, the model
simultaneously learns to recognize what is common between different-looking
images, and what is different between similar-looking images. This signal
causes our model to discover attributes that do not change across viewpoint,
but do change across time, while ignoring nuisance variables such as
occlusions, motion blur, lighting and background. We demonstrate that this
representation can be used by a robot to directly mimic human poses without an
explicit correspondence, and that it can be used as a reward function within a
reinforcement learning algorithm. While representations are learned from an
unlabeled collection of task-related videos, robot behaviors such as pouring
are learned by watching a single 3rd-person demonstration by a human. Reward
functions obtained by following the human demonstrations under the learned
representation enable efficient reinforcement learning that is practical for
real-world robotic systems. Video results, open-source code and dataset are
available at https://sermanet.github.io/imitat
Using humanoid robots to study human behavior
Our understanding of human behavior advances as our humanoid robotics work progresses-and vice versa. This team's work focuses on trajectory formation and planning, learning from demonstration, oculomotor control and interactive behaviors. They are programming robotic behavior based on how we humans âprogramâ behavior in-or train-each other
Investigation of the Sense of Agency in Social Cognition, based on frameworks of Predictive Coding and Active Inference: A simulation study on multimodal imitative interaction
When agents interact socially with different intentions, conflicts are
difficult to avoid. Although how agents can resolve such problems autonomously
has not been determined, dynamic characteristics of agency may shed light on
underlying mechanisms. The current study focused on the sense of agency (SoA),
a specific aspect of agency referring to congruence between the agent's
intention in acting and the outcome. Employing predictive coding and active
inference as theoretical frameworks of perception and action generation, we
hypothesize that regulation of complexity in the evidence lower bound of an
agent's model should affect the strength of the agent's SoA and should have a
critical impact on social interactions. We built a computational model of
imitative interaction between a robot and a human via visuo-proprioceptive
sensation with a variational Bayes recurrent neural network, and simulated the
model in the form of pseudo-imitative interaction using recorded human body
movement data. A key feature of the model is that each modality's complexity
can be regulated differently with a hyperparameter assigned to each module. We
first searched for an optimal setting that endows the model with appropriate
coordination of multimodal sensation. This revealed that the vision module's
complexity should be more tightly regulated than that of the proprioception
module. Using the optimally trained model, we examined how changing the
tightness of complexity regulation after training affects the strength of the
SoA during interactions. The results showed that with looser regulation, an
agent tends to act more egocentrically, without adapting to the other. In
contrast, with tighter regulation, the agent tends to follow the other by
adjusting its intention. We conclude that the tightness of complexity
regulation crucially affects the strength of the SoA and the dynamics of
interactions between agents.Comment: 23 pages, 8 figure
Probabilistic movement modeling for intention inference in human-robot interaction.
Intention inference can be an essential step toward efficient humanrobot interaction. For this purpose, we propose the Intention-Driven Dynamics Model (IDDM) to probabilistically model the generative process of movements that are directed by the intention. The IDDM allows to infer the intention from observed movements using Bayes â theorem. The IDDM simultaneously finds a latent state representation of noisy and highdimensional observations, and models the intention-driven dynamics in the latent states. As most robotics applications are subject to real-time constraints, we develop an efficient online algorithm that allows for real-time intention inference. Two human-robot interaction scenarios, i.e., target prediction for robot table tennis and action recognition for interactive humanoid robots, are used to evaluate the performance of our inference algorithm. In both intention inference tasks, the proposed algorithm achieves substantial improvements over support vector machines and Gaussian processes.
SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos
We present SLoMo: a first-of-its-kind framework for transferring skilled
motions from casually captured "in the wild" video footage of humans and
animals to legged robots. SLoMo works in three stages: 1) synthesize a
physically plausible reconstructed key-point trajectory from monocular videos;
2) optimize a dynamically feasible reference trajectory for the robot offline
that includes body and foot motion, as well as contact sequences that closely
tracks the key points; 3) track the reference trajectory online using a
general-purpose model-predictive controller on robot hardware. Traditional
motion imitation for legged motor skills often requires expert animators,
collaborative demonstrations, and/or expensive motion capture equipment, all of
which limits scalability. Instead, SLoMo only relies on easy-to-obtain
monocular video footage, readily available in online repositories such as
YouTube. It converts videos into motion primitives that can be executed
reliably by real-world robots. We demonstrate our approach by transferring the
motions of cats, dogs, and humans to example robots including a quadruped (on
hardware) and a humanoid (in simulation). To the best knowledge of the authors,
this is the first attempt at a general-purpose motion transfer framework that
imitates animal and human motions on legged robots directly from casual videos
without artificial markers or labels.Comment: accepted at RA-L 2023, with ICRA 2024 optio
Cultural differences in speed adaptation in human-robot interaction tasks
AbstractIn social interactions, human movement is a rich source of information for all those who take part in the collaboration. In fact, a variety of intuitive messages are communicated through motion and continuously inform the partners about the future unfolding of the actions. A similar exchange of implicit information could support movement coordination in the context of Human-Robot Interaction. In this work, we investigate how implicit signaling in an interaction with a humanoid robot can lead to emergent coordination in the form of automatic speed adaptation. In particular, we assess whether different cultures â specifically Japanese and Italian â have a different impact on motor resonance and synchronization in HRI. Japanese people show a higher general acceptance toward robots when compared with Western cultures. Since acceptance, or better affiliation, is tightly connected to imitation and mimicry, we hypothesize a higher degree of speed imitation for Japanese participants when compared to Italians. In the experimental studies undertaken both in Japan and Italy, we observe that cultural differences do not impact on the natural predisposition of subjects to adapt to the robot
- âŠ