20 research outputs found
Data-Driven Visual Tracking in Retinal Microsurgery
In the context of retinal microsurgery, visual tracking of instruments is a key component of robotics assistance. The difficulty of the task and major reason why most existing strategies fail on {\it in-vivo} image sequences lies in the fact that complex and severe changes in instrument appearance are challenging to model. This paper introduces a novel approach, that is both data-driven and complementary to existing tracking techniques. In particular, we show how to learn and integrate an accurate detector with a simple gradient-based tracker within a robust pipeline which runs at framerate. In addition, we present a fully annotated dataset of retinal instruments in {\it in-vivo} surgeries, which we use to quantitatively validate our approach. We also demonstrate an application of our method in a laparoscopy image sequence
Real-time 3D Tracking of Articulated Tools for Robotic Surgery
In robotic surgery, tool tracking is important for providing safe tool-tissue
interaction and facilitating surgical skills assessment. Despite recent
advances in tool tracking, existing approaches are faced with major
difficulties in real-time tracking of articulated tools. Most algorithms are
tailored for offline processing with pre-recorded videos. In this paper, we
propose a real-time 3D tracking method for articulated tools in robotic
surgery. The proposed method is based on the CAD model of the tools as well as
robot kinematics to generate online part-based templates for efficient 2D
matching and 3D pose estimation. A robust verification approach is incorporated
to reject outliers in 2D detections, which is then followed by fusing inliers
with robot kinematic readings for 3D pose estimation of the tool. The proposed
method has been validated with phantom data, as well as ex vivo and in vivo
experiments. The results derived clearly demonstrate the performance advantage
of the proposed method when compared to the state-of-the-art.Comment: This paper was presented in MICCAI 2016 conference, and a DOI was
linked to the publisher's versio
Simultaneous recognition and pose estimation of instruments in minimally invasive surgery
Detection of surgical instruments plays a key role in ensuring patient safety in minimally invasive surgery. In this paper, we present a novel method for 2D vision-based recognition and pose estimation of surgical instruments that generalizes to different surgical applications. At its core, we propose a novel scene model in order to simultaneously recognize multiple instruments as well as their parts. We use a Convolutional Neural Network architecture to embody our model and show that the cross-entropy loss is well suited to optimize its parameters which can be trained in an end-to-end fashion. An additional advantage of our approach is that instrument detection at test time is achieved while avoiding the need for scale-dependent sliding window evaluation. This allows our approach to be relatively parameter free at test time and shows good performance for both instrument detection and tracking. We show that our approach surpasses state-of-the-art results on in-vivo retinal microsurgery image data, as well as ex-vivo laparoscopic sequences
CaDIS: Cataract dataset for surgical RGB-image segmentation
Video feedback provides a wealth of information about surgical procedures and is the main sensory cue for surgeons. Scene understanding is crucial to computer assisted interventions (CAI) and to post-operative analysis of the surgical procedure. A fundamental building block of such capabilities is the identification and localization of surgical instruments and anatomical structures through semantic segmentation. Deep learning has advanced semantic segmentation techniques in the recent years but is inherently reliant on the availability of labelled datasets for model training. This paper introduces a dataset for semantic segmentation of cataract surgery videos complementing the publicly available CATARACTS challenge dataset. In addition, we benchmark the performance of several state-of-the-art deep learning models for semantic segmentation on the presented dataset. The dataset is publicly available at https://cataracts-semantic-segmentation2020.grand-challenge.org/
Articulated Multi-Instrument 2D Pose Estimation Using Fully Convolutional Networks
Instrument detection, pose estimation and tracking in surgical videos is an important vision component for computer assisted interventions. While significant advances have been made in recent years, articulation detection is still a major challenge. In this paper, we propose a deep neural network for articulated multi-instrument 2D pose estimation, which is trained on a detailed annotations of endoscopic and microscopic datasets. Our model is formed by a fully convolutional detection-regression network. Joints and associations between joint pairs in our instrument model are located by the detection subnetwork and are subsequently refined through a regression subnetwork. Based on the output from the model, the poses of the instruments are inferred using maximum bipartite graph matching. Our estimation framework is powered by deep learning techniques without any direct kinematic information from a robot. Our framework is tested on single-instrument RMIT data, and also on multi-instrument EndoVis and in vivo data with promising results. In addition, the dataset annotations are publicly released along with our code and model
CholecTrack20: A Dataset for Multi-Class Multiple Tool Tracking in Laparoscopic Surgery
Tool tracking in surgical videos is vital in computer-assisted intervention
for tasks like surgeon skill assessment, safety zone estimation, and
human-machine collaboration during minimally invasive procedures. The lack of
large-scale datasets hampers Artificial Intelligence implementation in this
domain. Current datasets exhibit overly generic tracking formalization, often
lacking surgical context: a deficiency that becomes evident when tools move out
of the camera's scope, resulting in rigid trajectories that hinder realistic
surgical representation. This paper addresses the need for a more precise and
adaptable tracking formalization tailored to the intricacies of endoscopic
procedures by introducing CholecTrack20, an extensive dataset meticulously
annotated for multi-class multi-tool tracking across three perspectives
representing the various ways of considering the temporal duration of a tool
trajectory: (1) intraoperative, (2) intracorporeal, and (3) visibility within
the camera's scope. The dataset comprises 20 laparoscopic videos with over
35,000 frames and 65,000 annotated tool instances with details on spatial
location, category, identity, operator, phase, and surgical visual conditions.
This detailed dataset caters to the evolving assistive requirements within a
procedure.Comment: Surgical tool tracking dataset paper, 15 pages, 9 figures, 4 table
Online tracking and retargeting with applications to optical biopsy in gastrointestinal endoscopic examinations
With recent advances in biophotonics, techniques such as narrow band imaging, confocal laser endomicroscopy, fluorescence spectroscopy, and optical coherence tomography, can be combined with normal white-light endoscopes to provide in vivo microscopic tissue characterisation, potentially avoiding the need for offline histological analysis. Despite the advantages of these techniques to provide online optical biopsy in situ, it is challenging for gastroenterologists to retarget the optical biopsy sites during endoscopic examinations. This is because optical biopsy does not leave any mark on the tissue. Furthermore, typical endoscopic cameras only have a limited field-of-view and the biopsy sites often enter or exit the camera view as the endoscope moves. In this paper, a framework for online tracking and retargeting is proposed based on the concept of tracking-by-detection. An online detection cascade is proposed where a random binary descriptor using Haar-like features is included as a random forest classifier. For robust retargeting, we have also proposed a RANSAC-based location verification component that incorporates shape context. The proposed detection cascade can be readily integrated with other temporal trackers. Detailed performance evaluation on in vivo gastrointestinal video sequences demonstrates the performance advantage of the proposed method over the current state-of-the-art