131 research outputs found
A Quantitative Evaluation of Dense 3D Reconstruction of Sinus Anatomy from Monocular Endoscopic Video
Generating accurate 3D reconstructions from endoscopic video is a promising
avenue for longitudinal radiation-free analysis of sinus anatomy and surgical
outcomes. Several methods for monocular reconstruction have been proposed,
yielding visually pleasant 3D anatomical structures by retrieving relative
camera poses with structure-from-motion-type algorithms and fusion of monocular
depth estimates. However, due to the complex properties of the underlying
algorithms and endoscopic scenes, the reconstruction pipeline may perform
poorly or fail unexpectedly. Further, acquiring medical data conveys additional
challenges, presenting difficulties in quantitatively benchmarking these
models, understanding failure cases, and identifying critical components that
contribute to their precision. In this work, we perform a quantitative analysis
of a self-supervised approach for sinus reconstruction using endoscopic
sequences paired with optical tracking and high-resolution computed tomography
acquired from nine ex-vivo specimens. Our results show that the generated
reconstructions are in high agreement with the anatomy, yielding an average
point-to-mesh error of 0.91 mm between reconstructions and CT segmentations.
However, in a point-to-point matching scenario, relevant for endoscope tracking
and navigation, we found average target registration errors of 6.58 mm. We
identified that pose and depth estimation inaccuracies contribute equally to
this error and that locally consistent sequences with shorter trajectories
generate more accurate reconstructions. These results suggest that achieving
global consistency between relative camera poses and estimated depths with the
anatomy is essential. In doing so, we can ensure proper synergy between all
components of the pipeline for improved reconstructions that will facilitate
clinical application of this innovative technology
Distilled Visual and Robot Kinematics Embeddings for Metric Depth Estimation in Monocular Scene Reconstruction
Estimating precise metric depth and scene reconstruction from monocular
endoscopy is a fundamental task for surgical navigation in robotic surgery.
However, traditional stereo matching adopts binocular images to perceive the
depth information, which is difficult to transfer to the soft robotics-based
surgical systems due to the use of monocular endoscopy. In this paper, we
present a novel framework that combines robot kinematics and monocular
endoscope images with deep unsupervised learning into a single network for
metric depth estimation and then achieve 3D reconstruction of complex anatomy.
Specifically, we first obtain the relative depth maps of surgical scenes by
leveraging a brightness-aware monocular depth estimation method. Then, the
corresponding endoscope poses are computed based on non-linear optimization of
geometric and photometric reprojection residuals. Afterwards, we develop a
Depth-driven Sliding Optimization (DDSO) algorithm to extract the scaling
coefficient from kinematics and calculated poses offline. By coupling the
metric scale and relative depth data, we form a robust ensemble that represents
the metric and consistent depth. Next, we treat the ensemble as supervisory
labels to train a metric depth estimation network for surgeries (i.e.,
MetricDepthS-Net) that distills the embeddings from the robot kinematics,
endoscopic videos, and poses. With accurate metric depth estimation, we utilize
a dense visual reconstruction method to recover the 3D structure of the whole
surgical site. We have extensively evaluated the proposed framework on public
SCARED and achieved comparable performance with stereo-based depth estimation
methods. Our results demonstrate the feasibility of the proposed approach to
recover the metric depth and 3D structure with monocular inputs
A comprehensive survey on recent deep learning-based methods applied to surgical data
Minimally invasive surgery is highly operator dependant with a lengthy
procedural time causing fatigue to surgeon and risks to patients such as injury
to organs, infection, bleeding, and complications of anesthesia. To mitigate
such risks, real-time systems are desired to be developed that can provide
intra-operative guidance to surgeons. For example, an automated system for tool
localization, tool (or tissue) tracking, and depth estimation can enable a
clear understanding of surgical scenes preventing miscalculations during
surgical procedures. In this work, we present a systematic review of recent
machine learning-based approaches including surgical tool localization,
segmentation, tracking, and 3D scene perception. Furthermore, we provide a
detailed overview of publicly available benchmark datasets widely used for
surgical navigation tasks. While recent deep learning architectures have shown
promising results, there are still several open research problems such as a
lack of annotated datasets, the presence of artifacts in surgical scenes, and
non-textured surfaces that hinder 3D reconstruction of the anatomical
structures. Based on our comprehensive review, we present a discussion on
current gaps and needed steps to improve the adaptation of technology in
surgery.Comment: This paper is to be submitted to International journal of computer
visio
Tracking and Mapping in Medical Computer Vision: A Review
As computer vision algorithms are becoming more capable, their applications
in clinical systems will become more pervasive. These applications include
diagnostics such as colonoscopy and bronchoscopy, guiding biopsies and
minimally invasive interventions and surgery, automating instrument motion and
providing image guidance using pre-operative scans. Many of these applications
depend on the specific visual nature of medical scenes and require designing
and applying algorithms to perform in this environment.
In this review, we provide an update to the field of camera-based tracking
and scene mapping in surgery and diagnostics in medical computer vision. We
begin with describing our review process, which results in a final list of 515
papers that we cover. We then give a high-level summary of the state of the art
and provide relevant background for those who need tracking and mapping for
their clinical applications. We then review datasets provided in the field and
the clinical needs therein. Then, we delve in depth into the algorithmic side,
and summarize recent developments, which should be especially useful for
algorithm designers and to those looking to understand the capability of
off-the-shelf methods. We focus on algorithms for deformable environments while
also reviewing the essential building blocks in rigid tracking and mapping
since there is a large amount of crossover in methods. Finally, we discuss the
current state of the tracking and mapping methods along with needs for future
algorithms, needs for quantification, and the viability of clinical
applications in the field. We conclude that new methods need to be designed or
combined to support clinical applications in deformable environments, and more
focus needs to be put into collecting datasets for training and evaluation.Comment: 31 pages, 17 figure
Learning-based depth and pose prediction for 3D scene reconstruction in endoscopy
Colorectal cancer is the third most common cancer worldwide. Early detection and treatment of pre-cancerous tissue during colonoscopy is critical to improving prognosis. However, navigating within the colon and inspecting the endoluminal tissue comprehensively are challenging, and success in both varies based on the endoscopist's skill and experience. Computer-assisted interventions in colonoscopy show much promise in improving navigation and inspection. For instance, 3D reconstruction of the colon during colonoscopy could promote more thorough examinations and increase adenoma detection rates which are associated with improved survival rates. Given the stakes, this thesis seeks to advance the state of research from feature-based traditional methods closer to a data-driven 3D reconstruction pipeline for colonoscopy.
More specifically, this thesis explores different methods that improve subtasks of learning-based 3D reconstruction. The main tasks are depth prediction and camera pose estimation. As training data is unavailable, the author, together with her co-authors, proposes and publishes several synthetic datasets and promotes domain adaptation models to improve applicability to real data. We show, through extensive experiments, that our depth prediction methods produce more robust results than previous work. Our pose estimation network trained on our new synthetic data outperforms self-supervised methods on real sequences. Our box embeddings allow us to interpret the geometric relationship and scale difference between two images of the same surface without the need for feature matches that are often unobtainable in surgical scenes. Together, the methods introduced in this thesis help work towards a complete, data-driven 3D reconstruction pipeline for endoscopy
- …