2,682 research outputs found
Recurrent Neural Networks with Weighting Loss for Early Prediction of Hand Movements
We propose in this work an approach for early prediction of hand movements using recurrent neural networks (RNNs) and a novel weighting loss. The proposed loss function leverages the outputs of an RNN at different time steps and weights their contributions to the final loss linearly over time steps. Additionally, a sample weighting scheme also constitutes a part of the weighting loss to deal with the scarcity of the samples where a change of hand movements takes place. The experiments conducted with the Ninapro database reveal that our proposed approach not only improves the performance in the early prediction task but also obtains state of the art results in classification of hand movements. These results are especially promising for the amputees
Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources
Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The objective of this work is to estimate the current state of the surgical task based on the actions performed or events occurred as the task progresses. We propose Fusion-KVE, a unified surgical state estimation model that incorporates multiple data sources including the Kinematics, Vision, and system Events. Additionally, we examine the strengths and weaknesses of different state estimation models in segmenting states with different representative features or levels of granularity. We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), as well as a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging, created using the da Vinci® Xi surgical system. Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models in both JIGSAWS suturing dataset and our RIOUS dataset
A Novel Predictive-Coding-Inspired Variational RNN Model for Online Prediction and Recognition
This study introduces PV-RNN, a novel variational RNN inspired by the
predictive-coding ideas. The model learns to extract the probabilistic
structures hidden in fluctuating temporal patterns by dynamically changing the
stochasticity of its latent states. Its architecture attempts to address two
major concerns of variational Bayes RNNs: how can latent variables learn
meaningful representations and how can the inference model transfer future
observations to the latent variables. PV-RNN does both by introducing adaptive
vectors mirroring the training data, whose values can then be adapted
differently during evaluation. Moreover, prediction errors during
backpropagation, rather than external inputs during the forward computation,
are used to convey information to the network about the external data. For
testing, we introduce error regression for predicting unseen sequences as
inspired by predictive coding that leverages those mechanisms. The model
introduces a weighting parameter, the meta-prior, to balance the optimization
pressure placed on two terms of a lower bound on the marginal likelihood of the
sequential data. We test the model on two datasets with probabilistic
structures and show that with high values of the meta-prior the network
develops deterministic chaos through which the data's randomness is imitated.
For low values, the model behaves as a random process. The network performs
best on intermediate values, and is able to capture the latent probabilistic
structure with good generalization. Analyzing the meta-prior's impact on the
network allows to precisely study the theoretical value and practical benefits
of incorporating stochastic dynamics in our model. We demonstrate better
prediction performance on a robot imitation task with our model using error
regression compared to a standard variational Bayes model lacking such a
procedure.Comment: The paper is accepted in Neural Computatio
Modeling Taxi Drivers' Behaviour for the Next Destination Prediction
In this paper, we study how to model taxi drivers' behaviour and geographical
information for an interesting and challenging task: the next destination
prediction in a taxi journey. Predicting the next location is a well studied
problem in human mobility, which finds several applications in real-world
scenarios, from optimizing the efficiency of electronic dispatching systems to
predicting and reducing the traffic jam. This task is normally modeled as a
multiclass classification problem, where the goal is to select, among a set of
already known locations, the next taxi destination. We present a Recurrent
Neural Network (RNN) approach that models the taxi drivers' behaviour and
encodes the semantics of visited locations by using geographical information
from Location-Based Social Networks (LBSNs). In particular, RNNs are trained to
predict the exact coordinates of the next destination, overcoming the problem
of producing, in output, a limited set of locations, seen during the training
phase. The proposed approach was tested on the ECML/PKDD Discovery Challenge
2015 dataset - based on the city of Porto -, obtaining better results with
respect to the competition winner, whilst using less information, and on
Manhattan and San Francisco datasets.Comment: preprint version of a paper submitted to IEEE Transactions on
Intelligent Transportation System
Estimation and Early Prediction of Grip Force Based on sEMG Signals and Deep Recurrent Neural Networks
Hands are used for communicating with the surrounding environment and have a
complex structure that enables them to perform various tasks with their
multiple degrees of freedom. Hand amputation can prevent a person from
performing their daily activities. In that event, finding a suitable, fast, and
reliable alternative for the missing limb can affect the lives of people who
suffer from such conditions. As the most important use of the hands is to grasp
objects, the purpose of this study is to accurately predict gripping force from
surface electromyography (sEMG) signals during a pinch-type grip. In that
regard, gripping force and sEMG signals are derived from 10 healthy subjects.
Results show that for this task, recurrent networks outperform nonrecurrent
ones, such as a fully connected multilayer perceptron (MLP) network. Gated
recurrent unit (GRU) and long short-term memory (LSTM) networks can predict the
gripping force with R-squared values of 0.994 and 0.992, respectively, and a
prediction rate of over 1300 predictions per second. The predominant advantage
of using such frameworks is that the gripping force can be predicted straight
from preprocessed sEMG signals without any form of feature extraction, not to
mention the ability to predict future force values using larger prediction
horizons adequately. The methods presented in this study can be used in the
myoelectric control of prosthetic hands or robotic grippers.Comment: 9 pages, accepted for publication in journal of the Brazilian Society
of Mechanical Sciences and Engineerin
RoboJam: A Musical Mixture Density Network for Collaborative Touchscreen Interaction
RoboJam is a machine-learning system for generating music that assists users
of a touchscreen music app by performing responses to their short
improvisations. This system uses a recurrent artificial neural network to
generate sequences of touchscreen interactions and absolute timings, rather
than high-level musical notes. To accomplish this, RoboJam's network uses a
mixture density layer to predict appropriate touch interaction locations in
space and time. In this paper, we describe the design and implementation of
RoboJam's network and how it has been integrated into a touchscreen music app.
A preliminary evaluation analyses the system in terms of training, musical
generation and user interaction
Occlusion resistant learning of intuitive physics from videos
To reach human performance on complex tasks, a key ability for artificial
systems is to understand physical interactions between objects, and predict
future outcomes of a situation. This ability, often referred to as intuitive
physics, has recently received attention and several methods were proposed to
learn these physical rules from video sequences. Yet, most of these methods are
restricted to the case where no, or only limited, occlusions occur. In this
work we propose a probabilistic formulation of learning intuitive physics in 3D
scenes with significant inter-object occlusions. In our formulation, object
positions are modeled as latent variables enabling the reconstruction of the
scene. We then propose a series of approximations that make this problem
tractable. Object proposals are linked across frames using a combination of a
recurrent interaction network, modeling the physics in object space, and a
compositional renderer, modeling the way in which objects project onto pixel
space. We demonstrate significant improvements over state-of-the-art in the
intuitive physics benchmark of IntPhys. We apply our method to a second dataset
with increasing levels of occlusions, showing it realistically predicts
segmentation masks up to 30 frames in the future. Finally, we also show results
on predicting motion of objects in real videos
- …