76 research outputs found
Self-Supervised Siamese Learning on Stereo Image Pairs for Depth Estimation in Robotic Surgery
Robotic surgery has become a powerful tool for performing minimally invasive
procedures, providing advantages in dexterity, precision, and 3D vision, over
traditional surgery. One popular robotic system is the da Vinci surgical
platform, which allows preoperative information to be incorporated into live
procedures using Augmented Reality (AR). Scene depth estimation is a
prerequisite for AR, as accurate registration requires 3D correspondences
between preoperative and intraoperative organ models. In the past decade, there
has been much progress on depth estimation for surgical scenes, such as using
monocular or binocular laparoscopes [1,2]. More recently, advances in deep
learning have enabled depth estimation via Convolutional Neural Networks (CNNs)
[3], but training requires a large image dataset with ground truth depths.
Inspired by [4], we propose a deep learning framework for surgical scene depth
estimation using self-supervision for scalable data acquisition. Our framework
consists of an autoencoder for depth prediction, and a differentiable spatial
transformer for training the autoencoder on stereo image pairs without ground
truth depths. Validation was conducted on stereo videos collected in robotic
partial nephrectomy.Comment: A two-page short report to be presented at the Hamlyn Symposium on
Medical Robotics 2017. An extension of this work is on progres
Self-supervised generative adverrsarial network for depth estimation in laparoscopic images
Dense depth estimation and 3D reconstruction of a surgical scene are crucial steps in computer assisted surgery. Recent work has shown that depth estimation from a stereo image pair could be solved with convolutional neural networks. However, most recent depth estimation models were trained on datasets with per-pixel ground truth. Such data is especially rare for laparoscopic imaging, making it hard to apply supervised depth estimation to real surgical applications. To overcome this limitation, we propose SADepth, a new self-supervised depth estimation method based on Generative Adversarial Networks. It consists of an encoder-decoder generator and a discriminator to incorporate geometry constraints during training. Multi-scale outputs from the generator help to solve the local minima caused by the photometric reprojection loss, while the adversarial learning improves the framework generation quality. Extensive experiments on two public datasets show that SADepth outperforms recent state-of-the-art unsupervised methods by a large margin, and reduces the gap between supervised and unsupervised depth estimation in laparoscopic images
Self-supervised monocular depth estimation with 3-D displacement module for laparoscopic images
We present a novel self-supervised training framework with 3D displacement (3DD) module for accurately estimating per-pixel depth maps from single laparoscopic images. Recently, several self-supervised learning based monocular depth estimation models have achieved good results on the KITTI dataset, under the hypothesis that the camera is dynamic and the objects are stationary, however this hypothesis is often reversed in the surgical setting (laparoscope is stationary, the surgical instruments and tissues are dynamic). Therefore, a 3DD module is proposed to establish the relation between frames instead of ego-motion estimation. In the 3DD module, a convolutional neural network (CNN) analyses source and target frames to predict the 3D displacement of a 3D point cloud from a target frame to a source frame in the coordinates of the camera. Since it is difficult to constrain the depth displacement from two 2D images, a novel depth consistency module is proposed to maintain depth consistency between displacement-updated depth and model-estimated depth to constrain 3D displacement effectively. Our proposed method achieves remarkable performance for monocular depth estimation on the Hamlyn surgical dataset and acquired ground truth depth maps, outperforming monodepth, monodepth2 and packnet models
H-Net: unsupervised attention-based stereo depth estimation leveraging epipolar geometry
Depth estimation from a stereo image pair has become one of the most explored applications in computer vision, with most previous methods relying on fully supervised learning settings. However, due to the difficulty in acquiring accurate and scalable ground truth data, the training of fully supervised methods is challenging. As an alternative, self-supervised methods are becoming more popular to mitigate this challenge. In this paper, we introduce the H-Net, a deep-learning framework for unsupervised stereo depth estimation that leverages epipolar geometry to refine stereo matching. For the first time, a Siamese autoencoder architecture is used for depth estimation which allows mutual information between rectified stereo images to be extracted. To enforce the epipolar constraint, the mutual epipolar attention mechanism has been designed which gives more emphasis to correspondences of features that lie on the same epipolar line while learning mutual information between the input stereo pair. Stereo correspondences are further enhanced by incorporating semantic information to the proposed attention mechanism. More specifically, the optimal transport algorithm is used to suppress attention and eliminate outliers in areas not visible in both cameras. Extensive experiments on KITTI2015 and Cityscapes show that the proposed modules are able to improve the performance of the unsupervised stereo depth estimation methods while closing the gap with the fully supervised approaches
Anytime Stereo Image Depth Estimation on Mobile Devices
Many applications of stereo depth estimation in robotics require the
generation of accurate disparity maps in real time under significant
computational constraints. Current state-of-the-art algorithms force a choice
between either generating accurate mappings at a slow pace, or quickly
generating inaccurate ones, and additionally these methods typically require
far too many parameters to be usable on power- or memory-constrained devices.
Motivated by these shortcomings, we propose a novel approach for disparity
prediction in the anytime setting. In contrast to prior work, our end-to-end
learned approach can trade off computation and accuracy at inference time.
Depth estimation is performed in stages, during which the model can be queried
at any time to output its current best estimate. Our final model can process
1242375 resolution images within a range of 10-35 FPS on an NVIDIA
Jetson TX2 module with only marginal increases in error -- using two orders of
magnitude fewer parameters than the most competitive baseline. The source code
is available at https://github.com/mileyan/AnyNet .Comment: Accepted by ICRA201
Simultaneous Depth Estimation and Surgical Tool Segmentation in Laparoscopic Images
Surgical instrument segmentation and depth estimation are crucial steps to improve autonomy in robotic surgery. Most recent works treat these problems separately, making the deployment challenging. In this paper, we propose a unified framework for depth estimation and surgical tool segmentation in laparoscopic images. The network has an encoder-decoder architecture and comprises two branches for simultaneously performing depth estimation and segmentation. To train the network end to end, we propose a new multi-task loss function that effectively learns to estimate depth in an unsupervised manner, while requiring only semi-ground truth for surgical tool segmentation. We conducted extensive experiments on different datasets to validate these findings. The results showed that the end-to-end network successfully improved the state-of-the-art for both tasks while reducing the complexity during their deployment
Self-supervised Depth Estimation to Regularise Semantic Segmentation in Knee Arthroscopy
Intra-operative automatic semantic segmentation of knee joint structures can
assist surgeons during knee arthroscopy in terms of situational awareness.
However, due to poor imaging conditions (e.g., low texture, overexposure,
etc.), automatic semantic segmentation is a challenging scenario, which
justifies the scarce literature on this topic. In this paper, we propose a
novel self-supervised monocular depth estimation to regularise the training of
the semantic segmentation in knee arthroscopy. To further regularise the depth
estimation, we propose the use of clean training images captured by the stereo
arthroscope of routine objects (presenting none of the poor imaging conditions
and with rich texture information) to pre-train the model. We fine-tune such
model to produce both the semantic segmentation and self-supervised monocular
depth using stereo arthroscopic images taken from inside the knee. Using a data
set containing 3868 arthroscopic images captured during cadaveric knee
arthroscopy with semantic segmentation annotations, 2000 stereo image pairs of
cadaveric knee arthroscopy, and 2150 stereo image pairs of routine objects, we
show that our semantic segmentation regularised by self-supervised depth
estimation produces a more accurate segmentation than a state-of-the-art
semantic segmentation approach modeled exclusively with semantic segmentation
annotation.Comment: 10 pages, 6 figure
Detecting the Sensing Area of A Laparoscopic Probe in Minimally Invasive Cancer Surgery
In surgical oncology, it is challenging for surgeons to identify lymph nodes
and completely resect cancer even with pre-operative imaging systems like PET
and CT, because of the lack of reliable intraoperative visualization tools.
Endoscopic radio-guided cancer detection and resection has recently been
evaluated whereby a novel tethered laparoscopic gamma detector is used to
localize a preoperatively injected radiotracer. This can both enhance the
endoscopic imaging and complement preoperative nuclear imaging data. However,
gamma activity visualization is challenging to present to the operator because
the probe is non-imaging and it does not visibly indicate the activity
origination on the tissue surface. Initial failed attempts used segmentation or
geometric methods, but led to the discovery that it could be resolved by
leveraging high-dimensional image features and probe position information. To
demonstrate the effectiveness of this solution, we designed and implemented a
simple regression network that successfully addressed the problem. To further
validate the proposed solution, we acquired and publicly released two datasets
captured using a custom-designed, portable stereo laparoscope system. Through
intensive experimentation, we demonstrated that our method can successfully and
effectively detect the sensing area, establishing a new performance benchmark.
Code and data are available at
https://github.com/br0202/Sensing_area_detection.gitComment: Accepted by MICCAI 202
Enhancing endoscopic navigation and polyp detection using artificial intelligence
Colorectal cancer (CRC) is one most common and deadly forms of cancer. It has a very high mortality rate if the disease advances to late stages however early diagnosis and treatment can be curative is hence essential to enhancing disease management. Colonoscopy is considered the gold standard for CRC screening and early therapeutic treatment. The effectiveness of colonoscopy is highly dependent on the operator’s skill, as a high level of hand-eye coordination is required to control the endoscope and fully examine the colon wall. Because of this, detection rates can vary between different gastroenterologists and technology have been proposed as solutions to assist disease detection and standardise detection rates. This thesis focuses on developing artificial intelligence algorithms to assist gastroenterologists during colonoscopy with the potential to ensure a baseline standard of quality in CRC screening. To achieve such assistance, the technical contributions develop deep learning methods and architectures for automated endoscopic image analysis to address both the detection of lesions in the endoscopic image and the 3D mapping of the endoluminal environment. The proposed detection models can run in real-time and assist visualization of different polyp types. Meanwhile the 3D reconstruction and mapping models developed are the basis for ensuring that the entire colon has been examined appropriately and to support quantitative measurement of polyp sizes using the image during a procedure. Results and validation studies presented within the thesis demonstrate how the developed algorithms perform on both general scenes and on clinical data. The feasibility of clinical translation is demonstrated for all of the models on endoscopic data from human participants during CRC screening examinations
- …