39 research outputs found

    Tracking and Mapping in Medical Computer Vision: A Review

    Full text link
    As computer vision algorithms are becoming more capable, their applications in clinical systems will become more pervasive. These applications include diagnostics such as colonoscopy and bronchoscopy, guiding biopsies and minimally invasive interventions and surgery, automating instrument motion and providing image guidance using pre-operative scans. Many of these applications depend on the specific visual nature of medical scenes and require designing and applying algorithms to perform in this environment. In this review, we provide an update to the field of camera-based tracking and scene mapping in surgery and diagnostics in medical computer vision. We begin with describing our review process, which results in a final list of 515 papers that we cover. We then give a high-level summary of the state of the art and provide relevant background for those who need tracking and mapping for their clinical applications. We then review datasets provided in the field and the clinical needs therein. Then, we delve in depth into the algorithmic side, and summarize recent developments, which should be especially useful for algorithm designers and to those looking to understand the capability of off-the-shelf methods. We focus on algorithms for deformable environments while also reviewing the essential building blocks in rigid tracking and mapping since there is a large amount of crossover in methods. Finally, we discuss the current state of the tracking and mapping methods along with needs for future algorithms, needs for quantification, and the viability of clinical applications in the field. We conclude that new methods need to be designed or combined to support clinical applications in deformable environments, and more focus needs to be put into collecting datasets for training and evaluation.Comment: 31 pages, 17 figure

    FetReg2021: A Challenge on Placental Vessel Segmentation and Registration in Fetoscopy

    Get PDF
    Fetoscopy laser photocoagulation is a widely adopted procedure for treating Twin-to-Twin Transfusion Syndrome (TTTS). The procedure involves photocoagulation pathological anastomoses to regulate blood exchange among twins. The procedure is particularly challenging due to the limited field of view, poor manoeuvrability of the fetoscope, poor visibility, and variability in illumination. These challenges may lead to increased surgery time and incomplete ablation. Computer-assisted intervention (CAI) can provide surgeons with decision support and context awareness by identifying key structures in the scene and expanding the fetoscopic field of view through video mosaicking. Research in this domain has been hampered by the lack of high-quality data to design, develop and test CAI algorithms. Through the Fetoscopic Placental Vessel Segmentation and Registration (FetReg2021) challenge, which was organized as part of the MICCAI2021 Endoscopic Vision challenge, we released the first largescale multicentre TTTS dataset for the development of generalized and robust semantic segmentation and video mosaicking algorithms. For this challenge, we released a dataset of 2060 images, pixel-annotated for vessels, tool, fetus and background classes, from 18 in-vivo TTTS fetoscopy procedures and 18 short video clips. Seven teams participated in this challenge and their model performance was assessed on an unseen test dataset of 658 pixel-annotated images from 6 fetoscopic procedures and 6 short clips. The challenge provided an opportunity for creating generalized solutions for fetoscopic scene understanding and mosaicking. In this paper, we present the findings of the FetReg2021 challenge alongside reporting a detailed literature review for CAI in TTTS fetoscopy. Through this challenge, its analysis and the release of multi-centre fetoscopic data, we provide a benchmark for future research in this field

    Robust fetoscopic mosaicking from deep learned flow fields

    Get PDF
    PURPOSE: Fetoscopic laser photocoagulation is a minimally invasive procedure to treat twin-to-twin transfusion syndrome during pregnancy by stopping irregular blood flow in the placenta. Building an image mosaic of the placenta and its network of vessels could assist surgeons to navigate in the challenging fetoscopic environment during the procedure. METHODOLOGY: We propose a fetoscopic mosaicking approach by combining deep learning-based optical flow with robust estimation for filtering inconsistent motions that occurs due to floating particles and specularities. While the current state of the art for fetoscopic mosaicking relies on clearly visible vessels for registration, our approach overcomes this limitation by considering the motion of all consistent pixels within consecutive frames. We also overcome the challenges in applying off-the-shelf optical flow to fetoscopic mosaicking through the use of robust estimation and local refinement. RESULTS: We compare our proposed method against the state-of-the-art vessel-based and optical flow-based image registration methods, and robust estimation alternatives. We also compare our proposed pipeline using different optical flow and robust estimation alternatives. CONCLUSIONS: Through analysis of our results, we show that our method outperforms both the vessel-based state of the art and LK, noticeably when vessels are either poorly visible or too thin to be reliably identified. Our approach is thus able to build consistent placental vessel mosaics in challenging cases where currently available alternatives fail

    Appearance Modelling and Reconstruction for Navigation in Minimally Invasive Surgery

    Get PDF
    Minimally invasive surgery is playing an increasingly important role for patient care. Whilst its direct patient benefit in terms of reduced trauma, improved recovery and shortened hospitalisation has been well established, there is a sustained need for improved training of the existing procedures and the development of new smart instruments to tackle the issue of visualisation, ergonomic control, haptic and tactile feedback. For endoscopic intervention, the small field of view in the presence of a complex anatomy can easily introduce disorientation to the operator as the tortuous access pathway is not always easy to predict and control with standard endoscopes. Effective training through simulation devices, based on either virtual reality or mixed-reality simulators, can help to improve the spatial awareness, consistency and safety of these procedures. This thesis examines the use of endoscopic videos for both simulation and navigation purposes. More specifically, it addresses the challenging problem of how to build high-fidelity subject-specific simulation environments for improved training and skills assessment. Issues related to mesh parameterisation and texture blending are investigated. With the maturity of computer vision in terms of both 3D shape reconstruction and localisation and mapping, vision-based techniques have enjoyed significant interest in recent years for surgical navigation. The thesis also tackles the problem of how to use vision-based techniques for providing a detailed 3D map and dynamically expanded field of view to improve spatial awareness and avoid operator disorientation. The key advantage of this approach is that it does not require additional hardware, and thus introduces minimal interference to the existing surgical workflow. The derived 3D map can be effectively integrated with pre-operative data, allowing both global and local 3D navigation by taking into account tissue structural and appearance changes. Both simulation and laboratory-based experiments are conducted throughout this research to assess the practical value of the method proposed

    Elasticity mapping for breast cancer diagnosis using tactile imaging and auxiliary sensor fusion

    Get PDF
    Tactile Imaging (TI) is a technology utilising capacitive pressure sensors to image elasticity distributions within soft tissues such as the breast for cancer screening. TI aims to solve critical problems in the cancer screening pathway, particularly: low sensitivity of manual palpation, patient discomfort during X-ray mammography, and the poor quality of breast cancer referral forms between primary and secondary care facilities. TI is effective in identifying ‘non-palpable’, early-stage tumours, with basic differential ability that reduced unnecessary biopsies by 21% in repeated clinical studies. TI has its limitations, particularly: the measured hardness of a lesion is relative to the background hardness, and lesion location estimates are subjective and prone to operator error. TI can achieve more than simple visualisation of lesions and can act as an accurate differentiator and material analysis tool with further metric development and acknowledgement of error sensitivities when transferring from phantom to clinical trials. This thesis explores and develops two methods, specifically inertial measurement and IR vein imaging, for determining the breast background elasticity, and registering tactile maps for lesion localisation, based on fusion of tactile and auxiliary sensors. These sensors enhance the capabilities of TI, with background tissue elasticity determined with MAE < 4% over tissues in the range 9 kPa – 90 kPa and probe trajectory across the breast measured with an error ratio < 0.3%, independent of applied load, validated on silicone phantoms. A basic TI error model is also proposed, maintaining tactile sensor stability and accuracy with 1% settling times < 1.5s over a range of realistic operating conditions. These developments are designed to be easily implemented into commercial systems, through appropriate design, to maximise impact, providing a stable platform for accurate tissue measurements. This will allow clinical TI to further reduce benign referral rates in a cost-effective manner, by elasticity differentiation and lesion classification in future works.Tactile Imaging (TI) is a technology utilising capacitive pressure sensors to image elasticity distributions within soft tissues such as the breast for cancer screening. TI aims to solve critical problems in the cancer screening pathway, particularly: low sensitivity of manual palpation, patient discomfort during X-ray mammography, and the poor quality of breast cancer referral forms between primary and secondary care facilities. TI is effective in identifying ‘non-palpable’, early-stage tumours, with basic differential ability that reduced unnecessary biopsies by 21% in repeated clinical studies. TI has its limitations, particularly: the measured hardness of a lesion is relative to the background hardness, and lesion location estimates are subjective and prone to operator error. TI can achieve more than simple visualisation of lesions and can act as an accurate differentiator and material analysis tool with further metric development and acknowledgement of error sensitivities when transferring from phantom to clinical trials. This thesis explores and develops two methods, specifically inertial measurement and IR vein imaging, for determining the breast background elasticity, and registering tactile maps for lesion localisation, based on fusion of tactile and auxiliary sensors. These sensors enhance the capabilities of TI, with background tissue elasticity determined with MAE < 4% over tissues in the range 9 kPa – 90 kPa and probe trajectory across the breast measured with an error ratio < 0.3%, independent of applied load, validated on silicone phantoms. A basic TI error model is also proposed, maintaining tactile sensor stability and accuracy with 1% settling times < 1.5s over a range of realistic operating conditions. These developments are designed to be easily implemented into commercial systems, through appropriate design, to maximise impact, providing a stable platform for accurate tissue measurements. This will allow clinical TI to further reduce benign referral rates in a cost-effective manner, by elasticity differentiation and lesion classification in future works

    Medical SLAM in an autonomous robotic system

    Get PDF
    One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-operative morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilities by observing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted instruments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This thesis addresses the ambitious goal of achieving surgical autonomy, through the study of the anatomical environment by Initially studying the technology present and what is needed to analyze the scene: vision sensors. A novel endoscope for autonomous surgical task execution is presented in the first part of this thesis. Which combines a standard stereo camera with a depth sensor. This solution introduces several key advantages, such as the possibility of reconstructing the 3D at a greater distance than traditional endoscopes. Then the problem of hand-eye calibration is tackled, which unites the vision system and the robot in a single reference system. Increasing the accuracy in the surgical work plan. In the second part of the thesis the problem of the 3D reconstruction and the algorithms currently in use were addressed. In MIS, simultaneous localization and mapping (SLAM) can be used to localize the pose of the endoscopic camera and build ta 3D model of the tissue surface. Another key element for MIS is to have real-time knowledge of the pose of surgical tools with respect to the surgical camera and underlying anatomy. Starting from the ORB-SLAM algorithm we have modified the architecture to make it usable in an anatomical environment by adding the registration of the pre-operative information of the intervention to the map obtained from the SLAM. Once it has been proven that the slam algorithm is usable in an anatomical environment, it has been improved by adding semantic segmentation to be able to distinguish dynamic features from static ones. All the results in this thesis are validated on training setups, which mimics some of the challenges of real surgery and on setups that simulate the human body within Autonomous Robotic Surgery (ARS) and Smart Autonomous Robotic Assistant Surgeon (SARAS) projects

    Medical SLAM in an autonomous robotic system

    Get PDF
    One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-operative morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilities by observing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted instruments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This thesis addresses the ambitious goal of achieving surgical autonomy, through the study of the anatomical environment by Initially studying the technology present and what is needed to analyze the scene: vision sensors. A novel endoscope for autonomous surgical task execution is presented in the first part of this thesis. Which combines a standard stereo camera with a depth sensor. This solution introduces several key advantages, such as the possibility of reconstructing the 3D at a greater distance than traditional endoscopes. Then the problem of hand-eye calibration is tackled, which unites the vision system and the robot in a single reference system. Increasing the accuracy in the surgical work plan. In the second part of the thesis the problem of the 3D reconstruction and the algorithms currently in use were addressed. In MIS, simultaneous localization and mapping (SLAM) can be used to localize the pose of the endoscopic camera and build ta 3D model of the tissue surface. Another key element for MIS is to have real-time knowledge of the pose of surgical tools with respect to the surgical camera and underlying anatomy. Starting from the ORB-SLAM algorithm we have modified the architecture to make it usable in an anatomical environment by adding the registration of the pre-operative information of the intervention to the map obtained from the SLAM. Once it has been proven that the slam algorithm is usable in an anatomical environment, it has been improved by adding semantic segmentation to be able to distinguish dynamic features from static ones. All the results in this thesis are validated on training setups, which mimics some of the challenges of real surgery and on setups that simulate the human body within Autonomous Robotic Surgery (ARS) and Smart Autonomous Robotic Assistant Surgeon (SARAS) projects

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]

    Performance of image guided navigation in laparoscopic liver surgery – A systematic review

    Get PDF
    Background: Compared to open surgery, minimally invasive liver resection has improved short term outcomes. It is however technically more challenging. Navigated image guidance systems (IGS) are being developed to overcome these challenges. The aim of this systematic review is to provide an overview of their current capabilities and limitations. Methods: Medline, Embase and Cochrane databases were searched using free text terms and corresponding controlled vocabulary. Titles and abstracts of retrieved articles were screened for inclusion criteria. Due to the heterogeneity of the retrieved data it was not possible to conduct a meta-analysis. Therefore results are presented in tabulated and narrative format. Results: Out of 2015 articles, 17 pre-clinical and 33 clinical papers met inclusion criteria. Data from 24 articles that reported on accuracy indicates that in recent years navigation accuracy has been in the range of 8–15 mm. Due to discrepancies in evaluation methods it is difficult to compare accuracy metrics between different systems. Surgeon feedback suggests that current state of the art IGS may be useful as a supplementary navigation tool, especially in small liver lesions that are difficult to locate. They are however not able to reliably localise all relevant anatomical structures. Only one article investigated IGS impact on clinical outcomes. Conclusions: Further improvements in navigation accuracy are needed to enable reliable visualisation of tumour margins with the precision required for oncological resections. To enhance comparability between different IGS it is crucial to find a consensus on the assessment of navigation accuracy as a minimum reporting standard

    Deep Retinal Optical Flow: From Synthetic Dataset Generation to Framework Creation and Evaluation

    Get PDF
    Sustained delivery of regenerative retinal therapies by robotic systems requires intra-operative tracking of the retinal fundus. This thesis presents a supervised convolutional neural network to densely predict optical flow of the retinal fundus, using semantic segmentation as an auxiliary task. Retinal flow information missing due to occlusion by surgical tools or other effects is implicitly inpainted, allowing for the robust tracking of surgical targets. As manual annotation of optical flow is infeasible, a flexible algorithm for the generation of large synthetic training datasets on the basis of given intra-operative retinal images and tool templates is developed. The compositing of synthetic images is approached as a layer-wise operation implementing a number of transforms at every level which can be extended as required, mimicking the various phenomena visible in real data. Optical flow ground truth is calculated from motion transforms with the help of oflib, an open-source optical flow library available from the Python Package Index. It enables the user to manipulate, evaluate, and combine flow fields. The PyTorch version of oflib is fully differentiable and therefore suitable for use in deep learning methods requiring back-propagation. The optical flow estimation from the network trained on synthetic data is evaluated using three performance metrics obtained from tracking a grid and sparsely annotated ground truth points. The evaluation benchmark consists of a series of challenging real intra-operative clips obtained from an extensive internally acquired dataset encompassing representative surgical cases. The deep learning approach clearly outperforms variational baseline methods and is shown to generalise well to real data showing scenarios routinely observed during vitreoretinal procedures. This indicates complex synthetic training datasets can be used to specifically guide optical flow estimation, laying the foundation for a robust system which can assist with intra-operative tracking of moving surgical targets even when occluded
    corecore