2,682 research outputs found

    Recurrent Neural Networks with Weighting Loss for Early Prediction of Hand Movements

    Get PDF
    We propose in this work an approach for early prediction of hand movements using recurrent neural networks (RNNs) and a novel weighting loss. The proposed loss function leverages the outputs of an RNN at different time steps and weights their contributions to the final loss linearly over time steps. Additionally, a sample weighting scheme also constitutes a part of the weighting loss to deal with the scarcity of the samples where a change of hand movements takes place. The experiments conducted with the Ninapro database reveal that our proposed approach not only improves the performance in the early prediction task but also obtains state of the art results in classification of hand movements. These results are especially promising for the amputees

    Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources

    Get PDF
    Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The objective of this work is to estimate the current state of the surgical task based on the actions performed or events occurred as the task progresses. We propose Fusion-KVE, a unified surgical state estimation model that incorporates multiple data sources including the Kinematics, Vision, and system Events. Additionally, we examine the strengths and weaknesses of different state estimation models in segmenting states with different representative features or levels of granularity. We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), as well as a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging, created using the da Vinci® Xi surgical system. Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models in both JIGSAWS suturing dataset and our RIOUS dataset

    A Novel Predictive-Coding-Inspired Variational RNN Model for Online Prediction and Recognition

    Get PDF
    This study introduces PV-RNN, a novel variational RNN inspired by the predictive-coding ideas. The model learns to extract the probabilistic structures hidden in fluctuating temporal patterns by dynamically changing the stochasticity of its latent states. Its architecture attempts to address two major concerns of variational Bayes RNNs: how can latent variables learn meaningful representations and how can the inference model transfer future observations to the latent variables. PV-RNN does both by introducing adaptive vectors mirroring the training data, whose values can then be adapted differently during evaluation. Moreover, prediction errors during backpropagation, rather than external inputs during the forward computation, are used to convey information to the network about the external data. For testing, we introduce error regression for predicting unseen sequences as inspired by predictive coding that leverages those mechanisms. The model introduces a weighting parameter, the meta-prior, to balance the optimization pressure placed on two terms of a lower bound on the marginal likelihood of the sequential data. We test the model on two datasets with probabilistic structures and show that with high values of the meta-prior the network develops deterministic chaos through which the data's randomness is imitated. For low values, the model behaves as a random process. The network performs best on intermediate values, and is able to capture the latent probabilistic structure with good generalization. Analyzing the meta-prior's impact on the network allows to precisely study the theoretical value and practical benefits of incorporating stochastic dynamics in our model. We demonstrate better prediction performance on a robot imitation task with our model using error regression compared to a standard variational Bayes model lacking such a procedure.Comment: The paper is accepted in Neural Computatio

    Modeling Taxi Drivers' Behaviour for the Next Destination Prediction

    Full text link
    In this paper, we study how to model taxi drivers' behaviour and geographical information for an interesting and challenging task: the next destination prediction in a taxi journey. Predicting the next location is a well studied problem in human mobility, which finds several applications in real-world scenarios, from optimizing the efficiency of electronic dispatching systems to predicting and reducing the traffic jam. This task is normally modeled as a multiclass classification problem, where the goal is to select, among a set of already known locations, the next taxi destination. We present a Recurrent Neural Network (RNN) approach that models the taxi drivers' behaviour and encodes the semantics of visited locations by using geographical information from Location-Based Social Networks (LBSNs). In particular, RNNs are trained to predict the exact coordinates of the next destination, overcoming the problem of producing, in output, a limited set of locations, seen during the training phase. The proposed approach was tested on the ECML/PKDD Discovery Challenge 2015 dataset - based on the city of Porto -, obtaining better results with respect to the competition winner, whilst using less information, and on Manhattan and San Francisco datasets.Comment: preprint version of a paper submitted to IEEE Transactions on Intelligent Transportation System

    Estimation and Early Prediction of Grip Force Based on sEMG Signals and Deep Recurrent Neural Networks

    Full text link
    Hands are used for communicating with the surrounding environment and have a complex structure that enables them to perform various tasks with their multiple degrees of freedom. Hand amputation can prevent a person from performing their daily activities. In that event, finding a suitable, fast, and reliable alternative for the missing limb can affect the lives of people who suffer from such conditions. As the most important use of the hands is to grasp objects, the purpose of this study is to accurately predict gripping force from surface electromyography (sEMG) signals during a pinch-type grip. In that regard, gripping force and sEMG signals are derived from 10 healthy subjects. Results show that for this task, recurrent networks outperform nonrecurrent ones, such as a fully connected multilayer perceptron (MLP) network. Gated recurrent unit (GRU) and long short-term memory (LSTM) networks can predict the gripping force with R-squared values of 0.994 and 0.992, respectively, and a prediction rate of over 1300 predictions per second. The predominant advantage of using such frameworks is that the gripping force can be predicted straight from preprocessed sEMG signals without any form of feature extraction, not to mention the ability to predict future force values using larger prediction horizons adequately. The methods presented in this study can be used in the myoelectric control of prosthetic hands or robotic grippers.Comment: 9 pages, accepted for publication in journal of the Brazilian Society of Mechanical Sciences and Engineerin

    RoboJam: A Musical Mixture Density Network for Collaborative Touchscreen Interaction

    Full text link
    RoboJam is a machine-learning system for generating music that assists users of a touchscreen music app by performing responses to their short improvisations. This system uses a recurrent artificial neural network to generate sequences of touchscreen interactions and absolute timings, rather than high-level musical notes. To accomplish this, RoboJam's network uses a mixture density layer to predict appropriate touch interaction locations in space and time. In this paper, we describe the design and implementation of RoboJam's network and how it has been integrated into a touchscreen music app. A preliminary evaluation analyses the system in terms of training, musical generation and user interaction

    Occlusion resistant learning of intuitive physics from videos

    Get PDF
    To reach human performance on complex tasks, a key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences. Yet, most of these methods are restricted to the case where no, or only limited, occlusions occur. In this work we propose a probabilistic formulation of learning intuitive physics in 3D scenes with significant inter-object occlusions. In our formulation, object positions are modeled as latent variables enabling the reconstruction of the scene. We then propose a series of approximations that make this problem tractable. Object proposals are linked across frames using a combination of a recurrent interaction network, modeling the physics in object space, and a compositional renderer, modeling the way in which objects project onto pixel space. We demonstrate significant improvements over state-of-the-art in the intuitive physics benchmark of IntPhys. We apply our method to a second dataset with increasing levels of occlusions, showing it realistically predicts segmentation masks up to 30 frames in the future. Finally, we also show results on predicting motion of objects in real videos
    • …
    corecore