46 research outputs found
Synthesis of Stereoscopic Views from Monocular Endoscopic Videos
Abstract Recent studies have shown that 3D imaging provides some unique advantages over traditional 2D imaging for minimal invasive surgery. However, most existing endoscopes still use single-lens cameras, and the use of duallens 3D imaging techniques is still limited. This paper proposes an approach to enabling 3D imaging from a singlelens endoscope by automatically synthesizing stereoscopic views from monocular images captured by the endoscope. We first formulate the problem by introducing the notion of normalized disparity, based on which we show that affine reconstruction is sufficient for stereoscopic view synthesis. With this formulation and exploiting other domain-specific constraints, we then propose a robust structure-from-motion algorithm for a sparse set of feature points and a fast, linear interpretation algorithm for creating a dense disparity field for synthesizing stereoscopic views from original monocular video. Both synthetic images and real endoscopic videos are used to evaluate the proposed method. The results demonstrate the feasibility and effectiveness of the proposed method
Unsupervised Odometry and Depth Learning for Endoscopic Capsule Robots
In the last decade, many medical companies and research groups have tried to
convert passive capsule endoscopes as an emerging and minimally invasive
diagnostic technology into actively steerable endoscopic capsule robots which
will provide more intuitive disease detection, targeted drug delivery and
biopsy-like operations in the gastrointestinal(GI) tract. In this study, we
introduce a fully unsupervised, real-time odometry and depth learner for
monocular endoscopic capsule robots. We establish the supervision by warping
view sequences and assigning the re-projection minimization to the loss
function, which we adopt in multi-view pose estimation and single-view depth
estimation network. Detailed quantitative and qualitative analyses of the
proposed framework performed on non-rigidly deformable ex-vivo porcine stomach
datasets proves the effectiveness of the method in terms of motion estimation
and depth recovery.Comment: submitted to IROS 201
Appearance Modelling and Reconstruction for Navigation in Minimally Invasive Surgery
Minimally invasive surgery is playing an increasingly important role for patient
care. Whilst its direct patient benefit in terms of reduced trauma,
improved recovery and shortened hospitalisation has been well established,
there is a sustained need for improved training of the existing procedures
and the development of new smart instruments to tackle the issue of visualisation,
ergonomic control, haptic and tactile feedback. For endoscopic
intervention, the small field of view in the presence of a complex anatomy
can easily introduce disorientation to the operator as the tortuous access
pathway is not always easy to predict and control with standard endoscopes.
Effective training through simulation devices, based on either virtual reality
or mixed-reality simulators, can help to improve the spatial awareness,
consistency and safety of these procedures.
This thesis examines the use of endoscopic videos for both simulation
and navigation purposes. More specifically, it addresses the challenging
problem of how to build high-fidelity subject-specific simulation environments
for improved training and skills assessment. Issues related to mesh
parameterisation and texture blending are investigated. With the maturity
of computer vision in terms of both 3D shape reconstruction and localisation
and mapping, vision-based techniques have enjoyed significant interest
in recent years for surgical navigation. The thesis also tackles the problem
of how to use vision-based techniques for providing a detailed 3D map and
dynamically expanded field of view to improve spatial awareness and avoid
operator disorientation. The key advantage of this approach is that it does
not require additional hardware, and thus introduces minimal interference
to the existing surgical workflow. The derived 3D map can be effectively
integrated with pre-operative data, allowing both global and local 3D navigation
by taking into account tissue structural and appearance changes.
Both simulation and laboratory-based experiments are conducted throughout
this research to assess the practical value of the method proposed
Ultrasound-Augmented Laparoscopy
Laparoscopic surgery is perhaps the most common minimally invasive procedure for many diseases in the abdomen. Since the laparoscopic camera provides only the surface view of the internal organs, in many procedures, surgeons use laparoscopic ultrasound (LUS) to visualize deep-seated surgical targets. Conventionally, the 2D LUS image is visualized in a display spatially separate from that displays the laparoscopic video. Therefore, reasoning about the geometry of hidden targets requires mentally solving the spatial alignment, and resolving the modality differences, which is cognitively very challenging. Moreover, the mental representation of hidden targets in space acquired through such cognitive medication may be error prone, and cause incorrect actions to be performed.
To remedy this, advanced visualization strategies are required where the US information is visualized in the context of the laparoscopic video. To this end, efficient computational methods are required to accurately align the US image coordinate system with that centred in the camera, and to render the registered image information in the context of the camera such that surgeons perceive the geometry of hidden targets accurately. In this thesis, such a visualization pipeline is described. A novel method to register US images with a camera centric coordinate system is detailed with an experimental investigation into its accuracy bounds. An improved method to blend US information with the surface view is also presented with an experimental investigation into the accuracy of perception of the target locations in space
On-the-fly dense 3D surface reconstruction for geometry-aware augmented reality.
Augmented Reality (AR) is an emerging technology that makes seamless connections between virtual space and the real world by superimposing computer-generated information onto the real-world environment. AR can provide additional information in a more intuitive and natural way than any other information-delivery method that a human has ever in- vented. Camera tracking is the enabling technology for AR and has been well studied for the last few decades. Apart from the tracking problems, sensing and perception of the surrounding environment are also very important and challenging problems. Although there are existing hardware solutions such as Microsoft Kinect and HoloLens that can sense and build the environmental structure, they are either too bulky or too expensive for AR. In this thesis, the challenging real-time dense 3D surface reconstruction technologies are studied and reformulated for the reinvention of basic position-aware AR towards geometry-aware and the outlook of context- aware AR. We initially propose to reconstruct the dense environmental surface using the sparse point from Simultaneous Localisation and Map- ping (SLAM), but this approach is prone to fail in challenging Minimally Invasive Surgery (MIS) scenes such as the presence of deformation and surgical smoke. We subsequently adopt stereo vision with SLAM for more accurate and robust results. With the success of deep learning technology in recent years, we present learning based single image re- construction and achieve the state-of-the-art results. Moreover, we pro- posed context-aware AR, one step further from purely geometry-aware AR towards the high-level conceptual interaction modelling in complex AR environment for enhanced user experience. Finally, a learning-based smoke removal method is proposed to ensure an accurate and robust reconstruction under extreme conditions such as the presence of surgical smoke
Dense Vision in Image-guided Surgery
Image-guided surgery needs an efficient and effective camera tracking system in order to perform augmented reality for overlaying preoperative models or label cancerous tissues on the 2D video images of the surgical scene. Tracking in endoscopic/laparoscopic scenes however is an extremely difficult task primarily due to tissue deformation, instrument invasion into the surgical scene and the presence of specular highlights. State of the art feature-based SLAM systems such as PTAM fail in tracking such scenes since the number of good features to track is very limited. When the scene is smoky and when there are instrument motions, it will cause feature-based tracking to fail immediately.
The work of this thesis provides a systematic approach to this problem using dense vision. We initially attempted to register a 3D preoperative model with multiple 2D endoscopic/laparoscopic images using a dense method but this approach did not perform well. We subsequently proposed stereo reconstruction to directly obtain the 3D structure of the scene. By using the dense reconstructed model together with robust estimation, we demonstrate that dense stereo tracking can be incredibly robust even within extremely challenging endoscopic/laparoscopic scenes.
Several validation experiments have been conducted in this thesis. The proposed stereo reconstruction algorithm has turned out to be the state of the art method for several publicly available ground truth datasets. Furthermore, the proposed robust dense stereo tracking algorithm has been proved highly accurate in synthetic environment (< 0.1 mm RMSE) and qualitatively extremely robust when being applied to real scenes in RALP prostatectomy surgery. This is an important step toward achieving accurate image-guided laparoscopic surgery.Open Acces
The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes
Estimating camera motion in deformable scenes poses a complex and open
research challenge. Most existing non-rigid structure from motion techniques
assume to observe also static scene parts besides deforming scene parts in
order to establish an anchoring reference. However, this assumption does not
hold true in certain relevant application cases such as endoscopies. Deformable
odometry and SLAM pipelines, which tackle the most challenging scenario of
exploratory trajectories, suffer from a lack of robustness and proper
quantitative evaluation methodologies. To tackle this issue with a common
benchmark, we introduce the Drunkard's Dataset, a challenging collection of
synthetic data targeting visual navigation and reconstruction in deformable
environments. This dataset is the first large set of exploratory camera
trajectories with ground truth inside 3D scenes where every surface exhibits
non-rigid deformations over time. Simulations in realistic 3D buildings lets us
obtain a vast amount of data and ground truth labels, including camera poses,
RGB images and depth, optical flow and normal maps at high resolution and
quality. We further present a novel deformable odometry method, dubbed the
Drunkard's Odometry, which decomposes optical flow estimates into rigid-body
camera motion and non-rigid scene deformations. In order to validate our data,
our work contains an evaluation of several baselines as well as a novel
tracking error metric which does not require ground truth data. Dataset and
code: https://davidrecasens.github.io/TheDrunkard'sOdometry
Medical SLAM in an autonomous robotic system
One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-operative morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilities by observing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted instruments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This thesis addresses the ambitious goal of achieving surgical autonomy, through the study of the anatomical environment by Initially studying the technology present and what is needed to analyze the scene: vision sensors. A novel endoscope for autonomous surgical task execution is presented in the first part of this thesis. Which combines a standard stereo camera with a depth sensor. This solution introduces several key advantages, such as the possibility of reconstructing the 3D at a greater distance than traditional endoscopes. Then the problem of hand-eye calibration is tackled, which unites the vision system and the robot in a single reference system. Increasing the accuracy in the surgical work plan. In the second part of the thesis the problem of the 3D reconstruction and the algorithms currently in use were addressed. In MIS, simultaneous localization and mapping (SLAM) can be used to localize the pose of the endoscopic camera and build ta 3D model of the tissue surface. Another key element for MIS is to have real-time knowledge of the pose of surgical tools with respect to the surgical camera and underlying anatomy. Starting from the ORB-SLAM algorithm we have modified the architecture to make it usable in an anatomical environment by adding the registration of the pre-operative information of the intervention to the map obtained from the SLAM. Once it has been proven that the slam algorithm is usable in an anatomical environment, it has been improved by adding semantic segmentation to be able to distinguish dynamic features from static ones. All the results in this thesis are validated on training setups, which mimics some of the challenges of real surgery and on setups that simulate the human body within Autonomous Robotic Surgery (ARS) and Smart Autonomous Robotic Assistant Surgeon (SARAS) projects
Learning-based depth and pose prediction for 3D scene reconstruction in endoscopy
Colorectal cancer is the third most common cancer worldwide. Early detection and treatment of pre-cancerous tissue during colonoscopy is critical to improving prognosis. However, navigating within the colon and inspecting the endoluminal tissue comprehensively are challenging, and success in both varies based on the endoscopist's skill and experience. Computer-assisted interventions in colonoscopy show much promise in improving navigation and inspection. For instance, 3D reconstruction of the colon during colonoscopy could promote more thorough examinations and increase adenoma detection rates which are associated with improved survival rates. Given the stakes, this thesis seeks to advance the state of research from feature-based traditional methods closer to a data-driven 3D reconstruction pipeline for colonoscopy.
More specifically, this thesis explores different methods that improve subtasks of learning-based 3D reconstruction. The main tasks are depth prediction and camera pose estimation. As training data is unavailable, the author, together with her co-authors, proposes and publishes several synthetic datasets and promotes domain adaptation models to improve applicability to real data. We show, through extensive experiments, that our depth prediction methods produce more robust results than previous work. Our pose estimation network trained on our new synthetic data outperforms self-supervised methods on real sequences. Our box embeddings allow us to interpret the geometric relationship and scale difference between two images of the same surface without the need for feature matches that are often unobtainable in surgical scenes. Together, the methods introduced in this thesis help work towards a complete, data-driven 3D reconstruction pipeline for endoscopy