1,047 research outputs found
Human Motion Trajectory Prediction: A Survey
With growing numbers of intelligent autonomous systems in human environments,
the ability of such systems to perceive, understand and anticipate human
behavior becomes increasingly important. Specifically, predicting future
positions of dynamic agents and planning considering such predictions are key
tasks for self-driving vehicles, service robots and advanced surveillance
systems. This paper provides a survey of human motion trajectory prediction. We
review, analyze and structure a large selection of work from different
communities and propose a taxonomy that categorizes existing methods based on
the motion modeling approach and level of contextual information used. We
provide an overview of the existing datasets and performance metrics. We
discuss limitations of the state of the art and outline directions for further
research.Comment: Submitted to the International Journal of Robotics Research (IJRR),
37 page
Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models
In recent years, much progress has been made in learning robotic manipulation
policies that follow natural language instructions. Such methods typically
learn from corpora of robot-language data that was either collected with
specific tasks in mind or expensively re-labelled by humans with rich language
descriptions in hindsight. Recently, large-scale pretrained vision-language
models (VLMs) like CLIP or ViLD have been applied to robotics for learning
representations and scene descriptors. Can these pretrained models serve as
automatic labelers for robot data, effectively importing Internet-scale
knowledge into existing datasets to make them useful even for tasks that are
not reflected in their ground truth annotations? To accomplish this, we
introduce Data-driven Instruction Augmentation for Language-conditioned control
(DIAL): we utilize semi-supervised language labels leveraging the semantic
understanding of CLIP to propagate knowledge onto large datasets of unlabelled
demonstration data and then train language-conditioned policies on the
augmented datasets. This method enables cheaper acquisition of useful language
descriptions compared to expensive human labels, allowing for more efficient
label coverage of large-scale datasets. We apply DIAL to a challenging
real-world robotic manipulation domain where 96.5% of the 80,000 demonstrations
do not contain crowd-sourced language annotations. DIAL enables imitation
learning policies to acquire new capabilities and generalize to 60 novel
instructions unseen in the original dataset
Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration
We propose a technique for multi-task learning from demonstration that trains
the controller of a low-cost robotic arm to accomplish several complex picking
and placing tasks, as well as non-prehensile manipulation. The controller is a
recurrent neural network using raw images as input and generating robot arm
trajectories, with the parameters shared across the tasks. The controller also
combines VAE-GAN-based reconstruction with autoregressive multimodal action
prediction. Our results demonstrate that it is possible to learn complex
manipulation tasks, such as picking up a towel, wiping an object, and
depositing the towel to its previous position, entirely from raw images with
direct behavior cloning. We show that weight sharing and reconstruction-based
regularization substantially improve generalization and robustness, and
training on multiple tasks simultaneously increases the success rate on all
tasks
Example Based Caricature Synthesis
The likeness of a caricature to the original face image is an essential and often overlooked part of caricature
production. In this paper we present an example based caricature synthesis technique, consisting of shape
exaggeration, relationship exaggeration, and optimization for likeness. Rather than relying on a large training set
of caricature face pairs, our shape exaggeration step is based on only one or a small number of examples of facial
features. The relationship exaggeration step introduces two definitions which facilitate global facial feature
synthesis. The first is the T-Shape rule, which describes the relative relationship between the facial elements in an
intuitive manner. The second is the so called proportions, which characterizes the facial features in a proportion
form. Finally we introduce a similarity metric as the likeness metric based on the Modified Hausdorff Distance
(MHD) which allows us to optimize the configuration of facial elements, maximizing likeness while satisfying a
number of constraints. The effectiveness of our algorithm is demonstrated with experimental results
The Cowl - v.67 - n.15 - Feb 6, 2003
The Cowl - student newspaper of Providence College. Vol 67 - No. 15 - February 6, 2003. 28 pages
- …