36 research outputs found
Object affordance as a guide for grasp-type recognition
Recognizing human grasping strategies is an important factor in robot
teaching as these strategies contain the implicit knowledge necessary to
perform a series of manipulations smoothly. This study analyzed the effects of
object affordance-a prior distribution of grasp types for each object-on
convolutional neural network (CNN)-based grasp-type recognition. To this end,
we created datasets of first-person grasping-hand images labeled with grasp
types and object names, and tested a recognition pipeline leveraging object
affordance. We evaluated scenarios with real and illusory objects to be
grasped, to consider a teaching condition in mixed reality where the lack of
visual object information can make the CNN recognition challenging. The results
show that object affordance guided the CNN in both scenarios, increasing the
accuracy by 1) excluding unlikely grasp types from the candidates and 2)
enhancing likely grasp types. In addition, the "enhancing effect" was more
pronounced with high degrees of grasp-type heterogeneity. These results
indicate the effectiveness of object affordance for guiding grasp-type
recognition in robot teaching applications.Comment: 12 pages, 11 figures. Last updated February 27th, 202
GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration
We introduce a pipeline that enhances a general-purpose Vision Language
Model, GPT-4V(ision), by integrating observations of human actions to
facilitate robotic manipulation. This system analyzes videos of humans
performing tasks and creates executable robot programs that incorporate
affordance insights. The computation starts by analyzing the videos with GPT-4V
to convert environmental and action details into text, followed by a
GPT-4-empowered task planner. In the following analyses, vision systems
reanalyze the video with the task plan. Object names are grounded using an
open-vocabulary object detector, while focus on the hand-object relation helps
to detect the moment of grasping and releasing. This spatiotemporal grounding
allows the vision systems to further gather affordance data (e.g., grasp type,
way points, and body postures). Experiments across various scenarios
demonstrate this method's efficacy in achieving real robots' operations from
human demonstrations in a zero-shot manner. The prompts of GPT-4V/GPT-4 are
available at this project page:
https://microsoft.github.io/GPT4Vision-Robot-Manipulation-Prompts/Comment: 8 pages, 10 figures, 1 table. Last updated on November 20th, 202
Interactive Task Encoding System for Learning-from-Observation
We introduce a practical pipeline that interactively encodes multimodal human
demonstrations for robot teaching. This pipeline is designed as an input system
for a framework called Learning-from-Observation (LfO), which aims to program
household robots with manipulative tasks through few-shots human demonstration
without coding. While most previous LfO systems run with visual demonstration,
recent research on robot teaching has shown the effectiveness of verbal
instruction in making recognition robust and teaching interactive. To the best
of our knowledge, however, no LfO system has yet been proposed that utilizes
both verbal instruction and interaction, namely \textit{multimodal LfO}. This
paper proposes the interactive task encoding system (ITES) as an input pipeline
for multimodal LfO. ITES assumes that the user teaches step-by-step, pausing
hand movements in order to match the granularity of human instructions with the
granularity of robot execution. ITES recognizes tasks based on step-by-step
verbal instructions that accompany the hand movements. Additionally, the
recognition is made robust through interactions with the user. We test ITES on
a real robot and show that the user can successfully teach multiple operations
through multimodal demonstrations. The results suggest the usefulness of ITES
for multimodal LfO. The source code is available at
https://github.com/microsoft/symbolic-robot-teaching-interface.Comment: 7 pages, 10 figures. Last updated January 24st, 202
GPT Models Meet Robotic Applications: Co-Speech Gesturing Chat System
This technical paper introduces a chatting robot system that utilizes recent
advancements in large-scale language models (LLMs) such as GPT-3 and ChatGPT.
The system is integrated with a co-speech gesture generation system, which
selects appropriate gestures based on the conceptual meaning of speech. Our
motivation is to explore ways of utilizing the recent progress in LLMs for
practical robotic applications, which benefits the development of both chatbots
and LLMs. Specifically, it enables the development of highly responsive chatbot
systems by leveraging LLMs and adds visual effects to the user interface of
LLMs as an additional value. The source code for the system is available on
GitHub for our in-house robot
(https://github.com/microsoft/LabanotationSuite/tree/master/MSRAbotChatSimulation)
and GitHub for Toyota HSR
(https://github.com/microsoft/GPT-Enabled-HSR-CoSpeechGestures)
ChatGPT Empowered Long-Step Robot Control in Various Environments: A Case Application
This paper demonstrates how OpenAI's ChatGPT can be used in a few-shot
setting to convert natural language instructions into an executable robot
action sequence. The paper proposes easy-to-customize input prompts for ChatGPT
that meet common requirements in practical applications, such as easy
integration with robot execution systems and applicability to various
environments while minimizing the impact of ChatGPT's token limit. The prompts
encourage ChatGPT to output a sequence of predefined robot actions, represent
the operating environment in a formalized style, and infer the updated state of
the operating environment. Experiments confirmed that the proposed prompts
enable ChatGPT to act according to requirements in various environments, and
users can adjust ChatGPT's output with natural language feedback for safe and
robust operation. The proposed prompts and source code are open-source and
publicly available at
https://github.com/microsoft/ChatGPT-Robot-Manipulation-PromptsComment: 17 figures. Last updated April 11th, 202