18 research outputs found
Dynamic Manipulation of Flexible Objects with Torque Sequence Using a Deep Neural Network
For dynamic manipulation of flexible objects, we propose an acquisition
method of a flexible object motion equation model using a deep neural network
and a control method to realize a target state by calculating an optimized
time-series joint torque command. By using the proposed method, any physics
model of a target object is not needed, and the object can be controlled as
intended. We applied this method to manipulations of a rigid object, a flexible
object with and without environmental contact, and a cloth, and verified its
effectiveness
Daily Assistive Modular Robot Design Based on Multi-Objective Black-Box Optimization
The range of robot activities is expanding from industries with fixed
environments to diverse and changing environments, such as nursing care support
and daily life support. In particular, autonomous construction of robots that
are personalized for each user and task is required. Therefore, we develop an
actuator module that can be reconfigured to various link configurations, can
carry heavy objects using a locking mechanism, and can be easily operated by
human teaching using a releasing mechanism. Given multiple target coordinates,
a modular robot configuration that satisfies these coordinates and minimizes
the required torque is automatically generated by Tree-structured Parzen
Estimator (TPE), a type of black-box optimization. Based on the obtained
results, we show that the robot can be reconfigured to perform various
functions such as moving monitors and lights, serving food, and so on.Comment: Accepted at IROS2023, website -
https://haraduka.github.io/auto-modular-design
A method for Selecting Scenes and Emotion-based Descriptions for a Robot's Diary
In this study, we examined scene selection methods and emotion-based
descriptions for a robot's daily diary. We proposed a scene selection method
and an emotion description method that take into account semantic and affective
information, and created several types of diaries. Experiments were conducted
to examine the change in sentiment values and preference of each diary, and it
was found that the robot's feelings and impressions changed more from date to
date when scenes were selected using the affective captions. Furthermore, we
found that the robot's emotion generally improves the preference of the robot's
diary regardless of the scene it describes. However, presenting negative or
mixed emotions at once may decrease the preference of the diary or reduce the
robot's robot-likeness, and thus the method of presenting emotions still needs
further investigation.Comment: 6 pages, 5 figures, ROMAN 202
Recognition of Heat-Induced Food State Changes by Time-Series Use of Vision-Language Model for Cooking Robot
Cooking tasks are characterized by large changes in the state of the food,
which is one of the major challenges in robot execution of cooking tasks. In
particular, cooking using a stove to apply heat to the foodstuff causes many
special state changes that are not seen in other tasks, making it difficult to
design a recognizer. In this study, we propose a unified method for recognizing
changes in the cooking state of robots by using the vision-language model that
can discriminate open-vocabulary objects in a time-series manner. We collected
data on four typical state changes in cooking using a real robot and confirmed
the effectiveness of the proposed method. We also compared the conditions and
discussed the types of natural language prompts and the image regions that are
suitable for recognizing the state changes.Comment: Accepted at IAS18-202
Binary State Recognition by Robots using Visual Question Answering of Pre-Trained Vision-Language Model
Recognition of the current state is indispensable for the operation of a
robot. There are various states to be recognized, such as whether an elevator
door is open or closed, whether an object has been grasped correctly, and
whether the TV is turned on or off. Until now, these states have been
recognized by programmatically describing the state of a point cloud or raw
image, by annotating and learning images, by using special sensors, etc. In
contrast to these methods, we apply Visual Question Answering (VQA) from a
Pre-Trained Vision-Language Model (PTVLM) trained on a large-scale dataset, to
such binary state recognition. This idea allows us to intuitively describe
state recognition in language without any re-training, thereby improving the
recognition ability of robots in a simple and general way. We summarize various
techniques in questioning methods and image processing, and clarify their
properties through experiments
VQA-based Robotic State Recognition Optimized with Genetic Algorithm
State recognition of objects and environment in robots has been conducted in
various ways. In most cases, this is executed by processing point clouds,
learning images with annotations, and using specialized sensors. In contrast,
in this study, we propose a state recognition method that applies Visual
Question Answering (VQA) in a Pre-Trained Vision-Language Model (PTVLM) trained
from a large-scale dataset. By using VQA, it is possible to intuitively
describe robotic state recognition in the spoken language. On the other hand,
there are various possible ways to ask about the same event, and the
performance of state recognition differs depending on the question. Therefore,
in order to improve the performance of state recognition using VQA, we search
for an appropriate combination of questions using a genetic algorithm. We show
that our system can recognize not only the open/closed of a refrigerator door
and the on/off of a display, but also the open/closed of a transparent door and
the state of water, which have been difficult to recognize.Comment: Accepted at ICRA202
Robotic Applications of Pre-Trained Vision-Language Models to Various Recognition Behaviors
In recent years, a number of models that learn the relations between vision
and language from large datasets have been released. These models perform a
variety of tasks, such as answering questions about images, retrieving
sentences that best correspond to images, and finding regions in images that
correspond to phrases. Although there are some examples, the connection between
these pre-trained vision-language models and robotics is still weak. If they
are directly connected to robot motions, they lose their versatility due to the
embodiment of the robot and the difficulty of data collection, and become
inapplicable to a wide range of bodies and situations. Therefore, in this
study, we categorize and summarize the methods to utilize the pre-trained
vision-language models flexibly and easily in a way that the robot can
understand, without directly connecting them to robot motions. We discuss how
to use these models for robot motion selection and motion planning without
re-training the models. We consider five types of methods to extract
information understandable for robots, and show the results of state
recognition, object recognition, affordance recognition, relation recognition,
and anomaly detection based on the combination of these five methods. We expect
that this study will add flexibility and ease-of-use, as well as new
applications, to the recognition behavior of existing robots
Online Estimation of Self-Body Deflection With Various Sensor Data Based on Directional Statistics
In this paper, we propose a method for online estimation of the robot's
posture. Our method uses von Mises and Bingham distributions as probability
distributions of joint angles and 3D orientation, which are used in directional
statistics. We constructed a particle filter using these distributions and
configured a system to estimate the robot's posture from various sensor
information (e.g., joint encoders, IMU sensors, and cameras). Furthermore,
unlike tangent space approximations, these distributions can handle global
features and represent sensor characteristics as observation noises. As an
application, we show that the yaw drift of a 6-axis IMU sensor can be
represented probabilistically to prevent adverse effects on attitude
estimation. For the estimation, we used an approximate model that assumes the
actual robot posture can be reproduced by correcting the joint angles of a
rigid body model. In the experiment part, we tested the estimator's
effectiveness by examining that the joint angles generated with the approximate
model can be estimated using the link pose of the same model. We then applied
the estimator to the actual robot and confirmed that the gripper position could
be estimated, thereby verifying the validity of the approximate model in our
situation.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Development of a Whole-body Work Imitation Learning System by a Biped and Bi-armed Humanoid
Imitation learning has been actively studied in recent years. In particular,
skill acquisition by a robot with a fixed body, whose root link position and
posture and camera angle of view do not change, has been realized in many
cases. On the other hand, imitation of the behavior of robots with floating
links, such as humanoid robots, is still a difficult task. In this study, we
develop an imitation learning system using a biped robot with a floating link.
There are two main problems in developing such a system. The first is a
teleoperation device for humanoids, and the second is a control system that can
withstand heavy workloads and long-term data collection. For the first point,
we use the whole body control device TABLIS. It can control not only the arms
but also the legs and can perform bilateral control with the robot. By
connecting this TABLIS with the high-power humanoid robot JAXON, we construct a
control system for imitation learning. For the second point, we will build a
system that can collect long-term data based on posture optimization, and can
simultaneously move the robot's limbs. We combine high-cycle posture generation
with posture optimization methods, including whole-body joint torque
minimization and contact force optimization. We designed an integrated system
with the above two features to achieve various tasks through imitation
learning. Finally, we demonstrate the effectiveness of this system by
experiments of manipulating flexible fabrics such that not only the hands but
also the head and waist move simultaneously, manipulating objects using legs
characteristic of humanoids, and lifting heavy objects that require large
forces.Comment: accepted at IROS202
Automatic Diary Generation System including Information on Joint Experiences between Humans and Robots
In this study, we propose an automatic diary generation system that uses
information from past joint experiences with the aim of increasing the
favorability for robots through shared experiences between humans and robots.
For the verbalization of the robot's memory, the system applies a large-scale
language model, which is a rapidly developing field. Since this model does not
have memories of experiences, it generates a diary by receiving information
from joint experiences. As an experiment, a robot and a human went for a walk
and generated a diary with interaction and dialogue history. The proposed diary
achieved high scores in comfort and performance in the evaluation of the
robot's impression. In the survey of diaries giving more favorable impressions,
diaries with information on joint experiences were selected higher than diaries
without such information, because diaries with information on joint experiences
showed more cooperation between the robot and the human and more intimacy from
the robot.Comment: 12 pages, 5 figures, IAS-1