66 research outputs found
Interactive Text2Pickup Network for Natural Language based Human-Robot Collaboration
In this paper, we propose the Interactive Text2Pickup (IT2P) network for
human-robot collaboration which enables an effective interaction with a human
user despite the ambiguity in user's commands. We focus on the task where a
robot is expected to pick up an object instructed by a human, and to interact
with the human when the given instruction is vague. The proposed network
understands the command from the human user and estimates the position of the
desired object first. To handle the inherent ambiguity in human language
commands, a suitable question which can resolve the ambiguity is generated. The
user's answer to the question is combined with the initial command and given
back to the network, resulting in more accurate estimation. The experiment
results show that given unambiguous commands, the proposed method can estimate
the position of the requested object with an accuracy of 98.49% based on our
test dataset. Given ambiguous language commands, we show that the accuracy of
the pick up task increases by 1.94 times after incorporating the information
obtained from the interaction.Comment: 8 pages, 9 figure
Past as a Guide: Leveraging Retrospective Learning for Python Code Completion
This work presents Past as a Guide (PaG), a simple approach for Large
Language Models (LLMs) to improve the coding capabilities by integrating the
past history with interactive and iterative code refinements. To be specific,
inspired by human cognitive processes, the proposed method enables LLMs to
utilize previous programming and debugging experiences to enhance the Python
code completion tasks. The framework facilitates LLMs to iteratively refine the
Python code based on previous execution and debugging results and optimize
learning and reasoning capabilities. The proposed methodology achieved a 92\%
pass@1 on HumanEval, demonstrating the potential to advance the field by
leveraging retrospection from past experiences and interactive and iterative
refinement processes without external correctness indicators.Comment: Neurips2023 Worksho
BLADE: Filter Learning for General Purpose Computational Photography
The Rapid and Accurate Image Super Resolution (RAISR) method of Romano,
Isidoro, and Milanfar is a computationally efficient image upscaling method
using a trained set of filters. We describe a generalization of RAISR, which we
name Best Linear Adaptive Enhancement (BLADE). This approach is a trainable
edge-adaptive filtering framework that is general, simple, computationally
efficient, and useful for a wide range of problems in computational
photography. We show applications to operations which may appear in a camera
pipeline including denoising, demosaicing, and stylization
SOCRATES: Text-based Human Search and Approach using a Robot Dog
In this paper, we propose a SOCratic model for Robots Approaching humans
based on TExt System (SOCRATES) focusing on the human search and approach based
on free-form textual description; the robot first searches for the target user,
then the robot proceeds to approach in a human-friendly manner. In particular,
textual descriptions are composed of appearance (e.g., wearing white shirts
with black hair) and location clues (e.g., is a student who works with robots).
We initially present a Human Search Socratic Model that connects large
pre-trained models in the language domain to solve the downstream task, which
is searching for the target person based on textual descriptions. Then, we
propose a hybrid learning-based framework for generating target-cordial robotic
motion to approach a person, consisting of a learning-from-demonstration module
and a knowledge distillation module. We validate the proposed searching module
via simulation using a virtual mobile robot as well as through real-world
experiments involving participants and the Boston Dynamics Spot robot.
Furthermore, we analyze the properties of the proposed approaching framework
with human participants based on the Robotic Social Attributes Scale (RoSAS)Comment: Project page: https://socratesrobotdog.github.io
Self-Supervised Motion Retargeting with Safety Guarantee
In this paper, we present self-supervised shared latent embedding (S3LE), a
data-driven motion retargeting method that enables the generation of natural
motions in humanoid robots from motion capture data or RGB videos. While it
requires paired data consisting of human poses and their corresponding robot
configurations, it significantly alleviates the necessity of time-consuming
data-collection via novel paired data generating processes. Our self-supervised
learning procedure consists of two steps: automatically generating paired data
to bootstrap the motion retargeting, and learning a projection-invariant
mapping to handle the different expressivity of humans and humanoid robots.
Furthermore, our method guarantees that the generated robot pose is
collision-free and satisfies position limits by utilizing nonparametric
regression in the shared latent space. We demonstrate that our method can
generate expressive robotic motions from both the CMU motion capture database
and YouTube videos
- …