21,479 research outputs found
Learning Human-Robot Collaboration Insights through the Integration of Muscle Activity in Interaction Motion Models
Recent progress in human-robot collaboration makes fast and fluid
interactions possible, even when human observations are partial and occluded.
Methods like Interaction Probabilistic Movement Primitives (ProMP) model human
trajectories through motion capture systems. However, such representation does
not properly model tasks where similar motions handle different objects. Under
current approaches, a robot would not adapt its pose and dynamics for proper
handling. We integrate the use of Electromyography (EMG) into the Interaction
ProMP framework and utilize muscular signals to augment the human observation
representation. The contribution of our paper is increased task discernment
when trajectories are similar but tools are different and require the robot to
adjust its pose for proper handling. Interaction ProMPs are used with an
augmented vector that integrates muscle activity. Augmented time-normalized
trajectories are used in training to learn correlation parameters and robot
motions are predicted by finding the best weight combination and temporal
scaling for a task. Collaborative single task scenarios with similar motions
but different objects were used and compared. For one experiment only joint
angles were recorded, for the other EMG signals were additionally integrated.
Task recognition was computed for both tasks. Observation state vectors with
augmented EMG signals were able to completely identify differences across
tasks, while the baseline method failed every time. Integrating EMG signals
into collaborative tasks significantly increases the ability of the system to
recognize nuances in the tasks that are otherwise imperceptible, up to 74.6% in
our studies. Furthermore, the integration of EMG signals for collaboration also
opens the door to a wide class of human-robot physical interactions based on
haptic communication that has been largely unexploited in the field.Comment: 7 pages, 2 figures, 2 tables. As submitted to Humanoids 201
A Survey on Bayesian Deep Learning
A comprehensive artificial intelligence system needs to not only perceive the
environment with different `senses' (e.g., seeing and hearing) but also infer
the world's conditional (or even causal) relations and corresponding
uncertainty. The past decade has seen major advances in many perception tasks
such as visual object recognition and speech recognition using deep learning
models. For higher-level inference, however, probabilistic graphical models
with their Bayesian nature are still more powerful and flexible. In recent
years, Bayesian deep learning has emerged as a unified probabilistic framework
to tightly integrate deep learning and Bayesian models. In this general
framework, the perception of text or images using deep learning can boost the
performance of higher-level inference and in turn, the feedback from the
inference process is able to enhance the perception of text or images. This
survey provides a comprehensive introduction to Bayesian deep learning and
reviews its recent applications on recommender systems, topic models, control,
etc. Besides, we also discuss the relationship and differences between Bayesian
deep learning and other related topics such as Bayesian treatment of neural
networks.Comment: To appear in ACM Computing Surveys (CSUR) 202
Interactive Robot Learning of Gestures, Language and Affordances
A growing field in robotics and Artificial Intelligence (AI) research is
human-robot collaboration, whose target is to enable effective teamwork between
humans and robots. However, in many situations human teams are still superior
to human-robot teams, primarily because human teams can easily agree on a
common goal with language, and the individual members observe each other
effectively, leveraging their shared motor repertoire and sensorimotor
resources. This paper shows that for cognitive robots it is possible, and
indeed fruitful, to combine knowledge acquired from interacting with elements
of the environment (affordance exploration) with the probabilistic observation
of another agent's actions.
We propose a model that unites (i) learning robot affordances and word
descriptions with (ii) statistical recognition of human gestures with vision
sensors. We discuss theoretical motivations, possible implementations, and we
show initial results which highlight that, after having acquired knowledge of
its surrounding environment, a humanoid robot can generalize this knowledge to
the case when it observes another agent (human partner) performing the same
motor actions previously executed during training.Comment: code available at https://github.com/gsaponaro/glu-gesture
Multi-Target Prediction: A Unifying View on Problems and Methods
Multi-target prediction (MTP) is concerned with the simultaneous prediction
of multiple target variables of diverse type. Due to its enormous application
potential, it has developed into an active and rapidly expanding research field
that combines several subfields of machine learning, including multivariate
regression, multi-label classification, multi-task learning, dyadic prediction,
zero-shot learning, network inference, and matrix completion. In this paper, we
present a unifying view on MTP problems and methods. First, we formally discuss
commonalities and differences between existing MTP problems. To this end, we
introduce a general framework that covers the above subfields as special cases.
As a second contribution, we provide a structured overview of MTP methods. This
is accomplished by identifying a number of key properties, which distinguish
such methods and determine their suitability for different types of problems.
Finally, we also discuss a few challenges for future research
Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences
We propose a neural sequence-to-sequence model for direction following, a
task that is essential to realizing effective autonomous agents. Our
alignment-based encoder-decoder model with long short-term memory recurrent
neural networks (LSTM-RNN) translates natural language instructions to action
sequences based upon a representation of the observable world state. We
introduce a multi-level aligner that empowers our model to focus on sentence
"regions" salient to the current world state by using multiple abstractions of
the input sentence. In contrast to existing methods, our model uses no
specialized linguistic resources (e.g., parsers) or task-specific annotations
(e.g., seed lexicons). It is therefore generalizable, yet still achieves the
best results reported to-date on a benchmark single-sentence dataset and
competitive results for the limited-training multi-sentence setting. We analyze
our model through a series of ablations that elucidate the contributions of the
primary components of our model.Comment: To appear at AAAI 2016 (and an extended version of a NIPS 2015
Multimodal Machine Learning workshop paper
- …