143 research outputs found
Estimation of a focused object using a corneal surface image for eye-based interaction
Researchers are considering the use of eye tracking in head-mounted camera systems, such as Google’s Project Glass. Typical methods require detailed calibration in advance, but long periods of use disrupt the calibration record between the eye and the scene camera. In addition, the focused object might not be estimated even if the point-of-regard is estimated using a portable eye-tracker. Therefore, we propose a novel method for estimating the object that a user is focused upon, where an eye camera captures the reflection on the corneal surface. Eye and environment information can be extracted from the corneal surface image simultaneously. We use inverse ray tracing to rectify the reflected image and a scale-invariant feature transform to estimate the object where the point-of-regard is located. Unwarped images can also be generated continuously from corneal surface images. We consider that our proposed method could be applied to a guidance system and we confirmed the feasibility of this application in experiments that estimated the object focused upon and the point-of-regard
Interactive Task Encoding System for Learning-from-Observation
We introduce a practical pipeline that interactively encodes multimodal human
demonstrations for robot teaching. This pipeline is designed as an input system
for a framework called Learning-from-Observation (LfO), which aims to program
household robots with manipulative tasks through few-shots human demonstration
without coding. While most previous LfO systems run with visual demonstration,
recent research on robot teaching has shown the effectiveness of verbal
instruction in making recognition robust and teaching interactive. To the best
of our knowledge, however, no LfO system has yet been proposed that utilizes
both verbal instruction and interaction, namely \textit{multimodal LfO}. This
paper proposes the interactive task encoding system (ITES) as an input pipeline
for multimodal LfO. ITES assumes that the user teaches step-by-step, pausing
hand movements in order to match the granularity of human instructions with the
granularity of robot execution. ITES recognizes tasks based on step-by-step
verbal instructions that accompany the hand movements. Additionally, the
recognition is made robust through interactions with the user. We test ITES on
a real robot and show that the user can successfully teach multiple operations
through multimodal demonstrations. The results suggest the usefulness of ITES
for multimodal LfO. The source code is available at
https://github.com/microsoft/symbolic-robot-teaching-interface.Comment: 7 pages, 10 figures. Last updated January 24st, 202
GPT Models Meet Robotic Applications: Co-Speech Gesturing Chat System
This technical paper introduces a chatting robot system that utilizes recent
advancements in large-scale language models (LLMs) such as GPT-3 and ChatGPT.
The system is integrated with a co-speech gesture generation system, which
selects appropriate gestures based on the conceptual meaning of speech. Our
motivation is to explore ways of utilizing the recent progress in LLMs for
practical robotic applications, which benefits the development of both chatbots
and LLMs. Specifically, it enables the development of highly responsive chatbot
systems by leveraging LLMs and adds visual effects to the user interface of
LLMs as an additional value. The source code for the system is available on
GitHub for our in-house robot
(https://github.com/microsoft/LabanotationSuite/tree/master/MSRAbotChatSimulation)
and GitHub for Toyota HSR
(https://github.com/microsoft/GPT-Enabled-HSR-CoSpeechGestures)
Bounding Box Annotation with Visible Status
Training deep-learning-based vision systems requires the manual annotation of
a significant amount of data to optimize several parameters of the deep
convolutional neural networks. Such manual annotation is highly time-consuming
and labor-intensive. To reduce this burden, a previous study presented a fully
automated annotation approach that does not require any manual intervention.
The proposed method associates a visual marker with an object and captures it
in the same image. However, because the previous method relied on moving the
object within the capturing range using a fixed-point camera, the collected
image dataset was limited in terms of capturing viewpoints. To overcome this
limitation, this study presents a mobile application-based free-viewpoint
image-capturing method. With the proposed application, users can collect
multi-view image datasets automatically that are annotated with bounding boxes
by moving the camera. However, capturing images through human involvement is
laborious and monotonous. Therefore, we propose gamified application features
to track the progress of the collection status. Our experiments demonstrated
that using the gamified mobile application for bounding box annotation, with
visible collection progress status, can motivate users to collect multi-view
object image datasets with less mental workload and time pressure in an
enjoyable manner, leading to increased engagement.Comment: 10 pages, 16 figure
Bias in Emotion Recognition with ChatGPT
This technical report explores the ability of ChatGPT in recognizing emotions
from text, which can be the basis of various applications like interactive
chatbots, data annotation, and mental health analysis. While prior research has
shown ChatGPT's basic ability in sentiment analysis, its performance in more
nuanced emotion recognition is not yet explored. Here, we conducted experiments
to evaluate its performance of emotion recognition across different datasets
and emotion labels. Our findings indicate a reasonable level of reproducibility
in its performance, with noticeable improvement through fine-tuning. However,
the performance varies with different emotion labels and datasets, highlighting
an inherent instability and possible bias. The choice of dataset and emotion
labels significantly impacts ChatGPT's emotion recognition performance. This
paper sheds light on the importance of dataset and label selection, and the
potential of fine-tuning in enhancing ChatGPT's emotion recognition
capabilities, providing a groundwork for better integration of emotion analysis
in applications using ChatGPT.Comment: 5 pages, 4 figures, 6 table
- …