402 research outputs found
Daily Assistive Modular Robot Design Based on Multi-Objective Black-Box Optimization
The range of robot activities is expanding from industries with fixed
environments to diverse and changing environments, such as nursing care support
and daily life support. In particular, autonomous construction of robots that
are personalized for each user and task is required. Therefore, we develop an
actuator module that can be reconfigured to various link configurations, can
carry heavy objects using a locking mechanism, and can be easily operated by
human teaching using a releasing mechanism. Given multiple target coordinates,
a modular robot configuration that satisfies these coordinates and minimizes
the required torque is automatically generated by Tree-structured Parzen
Estimator (TPE), a type of black-box optimization. Based on the obtained
results, we show that the robot can be reconfigured to perform various
functions such as moving monitors and lights, serving food, and so on.Comment: Accepted at IROS2023, website -
https://haraduka.github.io/auto-modular-design
A method for Selecting Scenes and Emotion-based Descriptions for a Robot's Diary
In this study, we examined scene selection methods and emotion-based
descriptions for a robot's daily diary. We proposed a scene selection method
and an emotion description method that take into account semantic and affective
information, and created several types of diaries. Experiments were conducted
to examine the change in sentiment values and preference of each diary, and it
was found that the robot's feelings and impressions changed more from date to
date when scenes were selected using the affective captions. Furthermore, we
found that the robot's emotion generally improves the preference of the robot's
diary regardless of the scene it describes. However, presenting negative or
mixed emotions at once may decrease the preference of the diary or reduce the
robot's robot-likeness, and thus the method of presenting emotions still needs
further investigation.Comment: 6 pages, 5 figures, ROMAN 202
Robotic Applications of Pre-Trained Vision-Language Models to Various Recognition Behaviors
In recent years, a number of models that learn the relations between vision
and language from large datasets have been released. These models perform a
variety of tasks, such as answering questions about images, retrieving
sentences that best correspond to images, and finding regions in images that
correspond to phrases. Although there are some examples, the connection between
these pre-trained vision-language models and robotics is still weak. If they
are directly connected to robot motions, they lose their versatility due to the
embodiment of the robot and the difficulty of data collection, and become
inapplicable to a wide range of bodies and situations. Therefore, in this
study, we categorize and summarize the methods to utilize the pre-trained
vision-language models flexibly and easily in a way that the robot can
understand, without directly connecting them to robot motions. We discuss how
to use these models for robot motion selection and motion planning without
re-training the models. We consider five types of methods to extract
information understandable for robots, and show the results of state
recognition, object recognition, affordance recognition, relation recognition,
and anomaly detection based on the combination of these five methods. We expect
that this study will add flexibility and ease-of-use, as well as new
applications, to the recognition behavior of existing robots
Online Estimation of Self-Body Deflection With Various Sensor Data Based on Directional Statistics
In this paper, we propose a method for online estimation of the robot's
posture. Our method uses von Mises and Bingham distributions as probability
distributions of joint angles and 3D orientation, which are used in directional
statistics. We constructed a particle filter using these distributions and
configured a system to estimate the robot's posture from various sensor
information (e.g., joint encoders, IMU sensors, and cameras). Furthermore,
unlike tangent space approximations, these distributions can handle global
features and represent sensor characteristics as observation noises. As an
application, we show that the yaw drift of a 6-axis IMU sensor can be
represented probabilistically to prevent adverse effects on attitude
estimation. For the estimation, we used an approximate model that assumes the
actual robot posture can be reproduced by correcting the joint angles of a
rigid body model. In the experiment part, we tested the estimator's
effectiveness by examining that the joint angles generated with the approximate
model can be estimated using the link pose of the same model. We then applied
the estimator to the actual robot and confirmed that the gripper position could
be estimated, thereby verifying the validity of the approximate model in our
situation.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Recognition of Heat-Induced Food State Changes by Time-Series Use of Vision-Language Model for Cooking Robot
Cooking tasks are characterized by large changes in the state of the food,
which is one of the major challenges in robot execution of cooking tasks. In
particular, cooking using a stove to apply heat to the foodstuff causes many
special state changes that are not seen in other tasks, making it difficult to
design a recognizer. In this study, we propose a unified method for recognizing
changes in the cooking state of robots by using the vision-language model that
can discriminate open-vocabulary objects in a time-series manner. We collected
data on four typical state changes in cooking using a real robot and confirmed
the effectiveness of the proposed method. We also compared the conditions and
discussed the types of natural language prompts and the image regions that are
suitable for recognizing the state changes.Comment: Accepted at IAS18-202
Binary State Recognition by Robots using Visual Question Answering of Pre-Trained Vision-Language Model
Recognition of the current state is indispensable for the operation of a
robot. There are various states to be recognized, such as whether an elevator
door is open or closed, whether an object has been grasped correctly, and
whether the TV is turned on or off. Until now, these states have been
recognized by programmatically describing the state of a point cloud or raw
image, by annotating and learning images, by using special sensors, etc. In
contrast to these methods, we apply Visual Question Answering (VQA) from a
Pre-Trained Vision-Language Model (PTVLM) trained on a large-scale dataset, to
such binary state recognition. This idea allows us to intuitively describe
state recognition in language without any re-training, thereby improving the
recognition ability of robots in a simple and general way. We summarize various
techniques in questioning methods and image processing, and clarify their
properties through experiments
Automatic Diary Generation System including Information on Joint Experiences between Humans and Robots
In this study, we propose an automatic diary generation system that uses
information from past joint experiences with the aim of increasing the
favorability for robots through shared experiences between humans and robots.
For the verbalization of the robot's memory, the system applies a large-scale
language model, which is a rapidly developing field. Since this model does not
have memories of experiences, it generates a diary by receiving information
from joint experiences. As an experiment, a robot and a human went for a walk
and generated a diary with interaction and dialogue history. The proposed diary
achieved high scores in comfort and performance in the evaluation of the
robot's impression. In the survey of diaries giving more favorable impressions,
diaries with information on joint experiences were selected higher than diaries
without such information, because diaries with information on joint experiences
showed more cooperation between the robot and the human and more intimacy from
the robot.Comment: 12 pages, 5 figures, IAS-1
HumanMimic: Learning Natural Locomotion and Transitions for Humanoid Robot via Wasserstein Adversarial Imitation
Transferring human motion skills to humanoid robots remains a significant
challenge. In this study, we introduce a Wasserstein adversarial imitation
learning system, allowing humanoid robots to replicate natural whole-body
locomotion patterns and execute seamless transitions by mimicking human
motions. First, we present a unified primitive-skeleton motion retargeting to
mitigate morphological differences between arbitrary human demonstrators and
humanoid robots. An adversarial critic component is integrated with
Reinforcement Learning (RL) to guide the control policy to produce behaviors
aligned with the data distribution of mixed reference motions. Additionally, we
employ a specific Integral Probabilistic Metric (IPM), namely the Wasserstein-1
distance with a novel soft boundary constraint to stabilize the training
process and prevent model collapse. Our system is evaluated on a full-sized
humanoid JAXON in the simulator. The resulting control policy demonstrates a
wide range of locomotion patterns, including standing, push-recovery, squat
walking, human-like straight-leg walking, and dynamic running. Notably, even in
the absence of transition motions in the demonstration dataset, robots showcase
an emerging ability to transit naturally between distinct locomotion patterns
as desired speed changes
Muscle-Tendon Complex-Inspired Deformable Exteriors as a Wire-Drive Extension
The 11th International Symposium on Adaptive Motion of Animals and Machines. Kobe University, Japan. 2023-06-06/09. Adaptive Motion of Animals and Machines Organizing Committee.Poster Session P5
Semantic Scene Difference Detection in Daily Life Patroling by Mobile Robots using Pre-Trained Large-Scale Vision-Language Model
It is important for daily life support robots to detect changes in their
environment and perform tasks. In the field of anomaly detection in computer
vision, probabilistic and deep learning methods have been used to calculate the
image distance. These methods calculate distances by focusing on image pixels.
In contrast, this study aims to detect semantic changes in the daily life
environment using the current development of large-scale vision-language
models. Using its Visual Question Answering (VQA) model, we propose a method to
detect semantic changes by applying multiple questions to a reference image and
a current image and obtaining answers in the form of sentences. Unlike deep
learning-based methods in anomaly detection, this method does not require any
training or fine-tuning, is not affected by noise, and is sensitive to semantic
state changes in the real world. In our experiments, we demonstrated the
effectiveness of this method by applying it to a patrol task in a real-life
environment using a mobile robot, Fetch Mobile Manipulator. In the future, it
may be possible to add explanatory power to changes in the daily life
environment through spoken language.Comment: Accepted to 2023 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS 2023
- …