5,073 research outputs found

    Estimating position & velocity in 3D space from monocular video sequences using a deep neural network

    Get PDF
    This work describes a regression model based on Convolutional Neural Networks (CNN) and Long-Short Term Memory (LSTM) networks for tracking objects from monocular video sequences. The target application being pursued is Vision-Based Sensor Substitution (VBSS). In particular, the tool-tip position and velocity in 3D space of a pair of surgical robotic instruments (SRI) are estimated for three surgical tasks, namely suturing, needle-passing and knot-tying. The CNN extracts features from individual video frames and the LSTM network processes these features over time and continuously outputs a 12-dimensional vector with the estimated position and velocity values. A series of analyses and experiments are carried out in the regression model to reveal the benefits and drawbacks of different design choices. First, the impact of the loss function is investigated by adequately weighing the Root Mean Squared Error (RMSE) and Gradient Difference Loss (GDL), using the VGG16 neural network for feature extraction. Second, this analysis is extended to a Residual Neural Network designed for feature extraction, which has fewer parameters than the VGG16 model, resulting in a reduction of ~96.44 % in the neural network size. Third, the impact of the number of time steps used to model the temporal information processed by the LSTM network is investigated. Finally, the capability of the regression model to generalize to the data related to "unseen" surgical tasks (unavailable in the training set) is evaluated. The aforesaid analyses are experimentally validated on the public dataset JIGSAWS. These analyses provide some guidelines for the design of a regression model in the context of VBSS, specifically when the objective is to estimate a set of 1D time series signals from video sequences.Peer ReviewedPostprint (author's final draft

    Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulation

    Full text link
    Manipulation of deformable objects, such as ropes and cloth, is an important but challenging problem in robotics. We present a learning-based system where a robot takes as input a sequence of images of a human manipulating a rope from an initial to goal configuration, and outputs a sequence of actions that can reproduce the human demonstration, using only monocular images as input. To perform this task, the robot learns a pixel-level inverse dynamics model of rope manipulation directly from images in a self-supervised manner, using about 60K interactions with the rope collected autonomously by the robot. The human demonstration provides a high-level plan of what to do and the low-level inverse model is used to execute the plan. We show that by combining the high and low-level plans, the robot can successfully manipulate a rope into a variety of target shapes using only a sequence of human-provided images for direction.Comment: 8 pages, accepted to International Conference on Robotics and Automation (ICRA) 201

    Automated pick-up of suturing needles for robotic surgical assistance

    Get PDF
    Robot-assisted laparoscopic prostatectomy (RALP) is a treatment for prostate cancer that involves complete or nerve sparing removal prostate tissue that contains cancer. After removal the bladder neck is successively sutured directly with the urethra. The procedure is called urethrovesical anastomosis and is one of the most dexterity demanding tasks during RALP. Two suturing instruments and a pair of needles are used in combination to perform a running stitch during urethrovesical anastomosis. While robotic instruments provide enhanced dexterity to perform the anastomosis, it is still highly challenging and difficult to learn. In this paper, we presents a vision-guided needle grasping method for automatically grasping the needle that has been inserted into the patient prior to anastomosis. We aim to automatically grasp the suturing needle in a position that avoids hand-offs and immediately enables the start of suturing. The full grasping process can be broken down into: a needle detection algorithm; an approach phase where the surgical tool moves closer to the needle based on visual feedback; and a grasping phase through path planning based on observed surgical practice. Our experimental results show examples of successful autonomous grasping that has the potential to simplify and decrease the operational time in RALP by assisting a small component of urethrovesical anastomosis

    Data-driven robotic manipulation of cloth-like deformable objects : the present, challenges and future prospects

    Get PDF
    Manipulating cloth-like deformable objects (CDOs) is a long-standing problem in the robotics community. CDOs are flexible (non-rigid) objects that do not show a detectable level of compression strength while two points on the article are pushed towards each other and include objects such as ropes (1D), fabrics (2D) and bags (3D). In general, CDOs’ many degrees of freedom (DoF) introduce severe self-occlusion and complex state–action dynamics as significant obstacles to perception and manipulation systems. These challenges exacerbate existing issues of modern robotic control methods such as imitation learning (IL) and reinforcement learning (RL). This review focuses on the application details of data-driven control methods on four major task families in this domain: cloth shaping, knot tying/untying, dressing and bag manipulation. Furthermore, we identify specific inductive biases in these four domains that present challenges for more general IL and RL algorithms.Publisher PDFPeer reviewe

    Learning to tie the knot: The acquisition of functional object representations by physical and observational experience

    Get PDF
    Here we examined neural substrates for physically and observationally learning to construct novel objects, and characterized brain regions associated with each kind of learning using fMRI. Each participant was assigned a training partner, and for five consecutive days practiced tying one group of knots (“tied” condition) or watched their partner tie different knots (“watched” condition) while a third set of knots remained untrained. Functional MRI was obtained prior to and immediately following the week of training while participants performed a visual knot-matching task. After training, a portion of left superior parietal lobule demonstrated a training by scan session interaction. This means this parietal region responded selectively to knots that participants had physically learned to tie in the post-training scan session but not the pre-training scan session. A conjunction analysis on the post-training scan data showed right intraparietal sulcus and right dorsal premotor cortex to respond when viewing images of knots from the tied and watched conditions compared to knots that were untrained during the post-training scan session. This suggests that these brain areas track both physical and observational learning. Together, the data provide preliminary evidence of engagement of brain regions associated with hand-object interactions when viewing objects associated with physical experience, and with observational experience without concurrent physical practice
    • …
    corecore