161 research outputs found
Distilled Visual and Robot Kinematics Embeddings for Metric Depth Estimation in Monocular Scene Reconstruction
Estimating precise metric depth and scene reconstruction from monocular
endoscopy is a fundamental task for surgical navigation in robotic surgery.
However, traditional stereo matching adopts binocular images to perceive the
depth information, which is difficult to transfer to the soft robotics-based
surgical systems due to the use of monocular endoscopy. In this paper, we
present a novel framework that combines robot kinematics and monocular
endoscope images with deep unsupervised learning into a single network for
metric depth estimation and then achieve 3D reconstruction of complex anatomy.
Specifically, we first obtain the relative depth maps of surgical scenes by
leveraging a brightness-aware monocular depth estimation method. Then, the
corresponding endoscope poses are computed based on non-linear optimization of
geometric and photometric reprojection residuals. Afterwards, we develop a
Depth-driven Sliding Optimization (DDSO) algorithm to extract the scaling
coefficient from kinematics and calculated poses offline. By coupling the
metric scale and relative depth data, we form a robust ensemble that represents
the metric and consistent depth. Next, we treat the ensemble as supervisory
labels to train a metric depth estimation network for surgeries (i.e.,
MetricDepthS-Net) that distills the embeddings from the robot kinematics,
endoscopic videos, and poses. With accurate metric depth estimation, we utilize
a dense visual reconstruction method to recover the 3D structure of the whole
surgical site. We have extensively evaluated the proposed framework on public
SCARED and achieved comparable performance with stereo-based depth estimation
methods. Our results demonstrate the feasibility of the proposed approach to
recover the metric depth and 3D structure with monocular inputs
LightDepth: Single-View Depth Self-Supervision from Illumination Decline
Single-view depth estimation can be remarkably effective if there is enough
ground-truth depth data for supervised training. However, there are scenarios,
especially in medicine in the case of endoscopies, where such data cannot be
obtained. In such cases, multi-view self-supervision and synthetic-to-real
transfer serve as alternative approaches, however, with a considerable
performance reduction in comparison to supervised case. Instead, we propose a
single-view self-supervised method that achieves a performance similar to the
supervised case. In some medical devices, such as endoscopes, the camera and
light sources are co-located at a small distance from the target surfaces.
Thus, we can exploit that, for any given albedo and surface orientation, pixel
brightness is inversely proportional to the square of the distance to the
surface, providing a strong single-view self-supervisory signal. In our
experiments, our self-supervised models deliver accuracies comparable to those
of fully supervised ones, while being applicable without depth ground-truth
data
Learning-based depth and pose prediction for 3D scene reconstruction in endoscopy
Colorectal cancer is the third most common cancer worldwide. Early detection and treatment of pre-cancerous tissue during colonoscopy is critical to improving prognosis. However, navigating within the colon and inspecting the endoluminal tissue comprehensively are challenging, and success in both varies based on the endoscopist's skill and experience. Computer-assisted interventions in colonoscopy show much promise in improving navigation and inspection. For instance, 3D reconstruction of the colon during colonoscopy could promote more thorough examinations and increase adenoma detection rates which are associated with improved survival rates. Given the stakes, this thesis seeks to advance the state of research from feature-based traditional methods closer to a data-driven 3D reconstruction pipeline for colonoscopy.
More specifically, this thesis explores different methods that improve subtasks of learning-based 3D reconstruction. The main tasks are depth prediction and camera pose estimation. As training data is unavailable, the author, together with her co-authors, proposes and publishes several synthetic datasets and promotes domain adaptation models to improve applicability to real data. We show, through extensive experiments, that our depth prediction methods produce more robust results than previous work. Our pose estimation network trained on our new synthetic data outperforms self-supervised methods on real sequences. Our box embeddings allow us to interpret the geometric relationship and scale difference between two images of the same surface without the need for feature matches that are often unobtainable in surgical scenes. Together, the methods introduced in this thesis help work towards a complete, data-driven 3D reconstruction pipeline for endoscopy
On the Uncertain Single-View Depths in Colonoscopies
Estimating depth information from endoscopic images is a prerequisite for a
wide set of AI-assisted technologies, such as accurate localization and
measurement of tumors, or identification of non-inspected areas. As the domain
specificity of colonoscopies -- deformable low-texture environments with
fluids, poor lighting conditions and abrupt sensor motions -- pose challenges
to multi-view 3D reconstructions, single-view depth learning stands out as a
promising line of research. Depth learning can be extended in a Bayesian
setting, which enables continual learning, improves decision making and can be
used to compute confidence intervals or quantify uncertainty for in-body
measurements. In this paper, we explore for the first time Bayesian deep
networks for single-view depth estimation in colonoscopies. Our specific
contribution is two-fold: 1) an exhaustive analysis of scalable Bayesian
networks for depth learning in different datasets, highlighting challenges and
conclusions regarding synthetic-to-real domain changes and supervised vs.
self-supervised methods; and 2) a novel teacher-student approach to deep depth
learning that takes into account the teacher uncertainty.Comment: 11 page
Self-supervised monocular depth estimation with 3-D displacement module for laparoscopic images
We present a novel self-supervised training framework with 3D displacement (3DD) module for accurately estimating per-pixel depth maps from single laparoscopic images. Recently, several self-supervised learning based monocular depth estimation models have achieved good results on the KITTI dataset, under the hypothesis that the camera is dynamic and the objects are stationary, however this hypothesis is often reversed in the surgical setting (laparoscope is stationary, the surgical instruments and tissues are dynamic). Therefore, a 3DD module is proposed to establish the relation between frames instead of ego-motion estimation. In the 3DD module, a convolutional neural network (CNN) analyses source and target frames to predict the 3D displacement of a 3D point cloud from a target frame to a source frame in the coordinates of the camera. Since it is difficult to constrain the depth displacement from two 2D images, a novel depth consistency module is proposed to maintain depth consistency between displacement-updated depth and model-estimated depth to constrain 3D displacement effectively. Our proposed method achieves remarkable performance for monocular depth estimation on the Hamlyn surgical dataset and acquired ground truth depth maps, outperforming monodepth, monodepth2 and packnet models
- …