Search CORE

609 research outputs found

Deep Reinforcement Learning in Surgical Robotics: Enhancing the Automation Level

Author: Qian Cheng
Ren Hongliang
Publication venue
Publication date: 01/09/2023
Field of study

Surgical robotics is a rapidly evolving field that is transforming the landscape of surgeries. Surgical robots have been shown to enhance precision, minimize invasiveness, and alleviate surgeon fatigue. One promising area of research in surgical robotics is the use of reinforcement learning to enhance the automation level. Reinforcement learning is a type of machine learning that involves training an agent to make decisions based on rewards and punishments. This literature review aims to comprehensively analyze existing research on reinforcement learning in surgical robotics. The review identified various applications of reinforcement learning in surgical robotics, including pre-operative, intra-body, and percutaneous procedures, listed the typical studies, and compared their methodologies and results. The findings show that reinforcement learning has great potential to improve the autonomy of surgical robots. Reinforcement learning can teach robots to perform complex surgical tasks, such as suturing and tissue manipulation. It can also improve the accuracy and precision of surgical robots, making them more effective at performing surgeries

arXiv.org e-Print Archive

Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

Author: Bai Long
Islam Mobarakol
Ren Hongliang
Publication venue
Publication date: 11/07/2023
Field of study

Medical students and junior surgeons often rely on senior surgeons and specialists to answer their questions when learning surgery. However, experts are often busy with clinical and academic work, and have little time to give guidance. Meanwhile, existing deep learning (DL)-based surgical Visual Question Answering (VQA) systems can only provide simple answers without the location of the answers. In addition, vision-language (ViL) embedding is still a less explored research in these kinds of tasks. Therefore, a surgical Visual Question Localized-Answering (VQLA) system would be helpful for medical students and junior surgeons to learn and understand from recorded surgical videos. We propose an end-to-end Transformer with Co-Attention gaTed Vision-Language (CAT-ViL) for VQLA in surgical scenarios, which does not require feature extraction through detection models. The CAT-ViL embedding module is designed to fuse heterogeneous features from visual and textual sources. The fused embedding will feed a standard Data-Efficient Image Transformer (DeiT) module, before the parallel classifier and detector for joint prediction. We conduct the experimental validation on public surgical videos from MICCAI EndoVis Challenge 2017 and 2018. The experimental results highlight the superior performance and robustness of our proposed model compared to the state-of-the-art approaches. Ablation studies further prove the outstanding performance of all the proposed components. The proposed method provides a promising solution for surgical scene understanding, and opens up a primary step in the Artificial Intelligence (AI)-based VQLA system for surgical training. Our code is publicly available.Comment: To appear in MICCAI 2023. Code availability: https://github.com/longbai1006/CAT-Vi

arXiv.org e-Print Archive

Revisiting Distillation for Continual Learning on Visual Question Localized-Answering in Robotic Surgery

Author: Bai Long
Islam Mobarakol
Ren Hongliang
Publication venue
Publication date: 22/07/2023
Field of study

The visual-question localized-answering (VQLA) system can serve as a knowledgeable assistant in surgical education. Except for providing text-based answers, the VQLA system can highlight the interested region for better surgical scene understanding. However, deep neural networks (DNNs) suffer from catastrophic forgetting when learning new knowledge. Specifically, when DNNs learn on incremental classes or tasks, their performance on old tasks drops dramatically. Furthermore, due to medical data privacy and licensing issues, it is often difficult to access old data when updating continual learning (CL) models. Therefore, we develop a non-exemplar continual surgical VQLA framework, to explore and balance the rigidity-plasticity trade-off of DNNs in a sequential learning paradigm. We revisit the distillation loss in CL tasks, and propose rigidity-plasticity-aware distillation (RP-Dist) and self-calibrated heterogeneous distillation (SH-Dist) to preserve the old knowledge. The weight aligning (WA) technique is also integrated to adjust the weight bias between old and new tasks. We further establish a CL framework on three public surgical datasets in the context of surgical settings that consist of overlapping classes between old and new surgical VQLA tasks. With extensive experiments, we demonstrate that our proposed method excellently reconciles learning and forgetting on the continual surgical VQLA over conventional CL methods. Our code is publicly accessible.Comment: To appear in MICCAI 2023. Code availability: https://github.com/longbai1006/CS-VQL

arXiv.org e-Print Archive

Sim-to-Real Segmentation in Robot-assisted Transoral Tracheal Intubation

Author: Bai Long
Lai Jiewen
Ren Hongliang
Ren Tian-Ao
Wang Guankun
Publication venue
Publication date: 19/05/2023
Field of study

Robotic-assisted tracheal intubation requires the robot to distinguish anatomical features like an experienced physician using deep-learning techniques. However, real datasets of oropharyngeal organs are limited due to patient privacy issues, making it challenging to train deep-learning models for accurate image segmentation. We hereby consider generating a new data modality through a virtual environment to assist the training process. Specifically, this work introduces a virtual dataset generated by the Simulation Open Framework Architecture (SOFA) framework to overcome the limited availability of actual endoscopic images. We also propose a domain adaptive Sim-to-Real method for oropharyngeal organ image segmentation, which employs an image blending strategy called IoU-Ranking Blend (IRB) and style-transfer techniques to address discrepancies between datasets. Experimental results demonstrate the superior performance of the proposed approach with domain adaptive models, improving segmentation accuracy and training stability. In the practical application, the trained segmentation model holds great promise for robot-assisted intubation surgery and intelligent surgical navigation.Comment: Extended abstract in IEEE ICRA 2023 Workshop (New Evolutions in Surgical Robotics: Embracing Multimodal Imaging Guidance, Intelligence, and Bio-inspired Mechanisms

arXiv.org e-Print Archive

Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs

Author: Bai Long
Lai Jiewen
Ren Hongliang
Ren Tian-Ao
Wang Guankun
Publication venue
Publication date: 27/07/2023
Field of study

Video-assisted transoral tracheal intubation (TI) necessitates using an endoscope that helps the physician insert a tracheal tube into the glottis instead of the esophagus. The growing trend of robotic-assisted TI would require a medical robot to distinguish anatomical features like an experienced physician which can be imitated by utilizing supervised deep-learning techniques. However, the real datasets of oropharyngeal organs are often inaccessible due to limited open-source data and patient privacy. In this work, we propose a domain adaptive Sim-to-Real framework called IoU-Ranking Blend-ArtFlow (IRB-AF) for image segmentation of oropharyngeal organs. The framework includes an image blending strategy called IoU-Ranking Blend (IRB) and style-transfer method ArtFlow. Here, IRB alleviates the problem of poor segmentation performance caused by significant datasets domain differences; while ArtFlow is introduced to reduce the discrepancies between datasets further. A virtual oropharynx image dataset generated by the SOFA framework is used as the learning subject for semantic segmentation to deal with the limited availability of actual endoscopic images. We adapted IRB-AF with the state-of-the-art domain adaptive segmentation models. The results demonstrate the superior performance of our approach in further improving the segmentation accuracy and training stability.Comment: The manuscript is accepted by Medical & Biological Engineering & Computing. Code and dataset: https://github.com/gkw0010/EISOST-Sim2Real-Dataset-Releas

arXiv.org e-Print Archive

Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery

Author: Bai Long
Islam Mobarakol
Ren Hongliang
Xu Mengya
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/02/2024
Field of study

Deep Neural Networks (DNNs) based semantic segmentation of the robotic instruments and tissues can enhance the precision of surgical activities in robot-assisted surgery. However, in biological learning, DNNs cannot learn incremental tasks over time and exhibit catastrophic forgetting, which refers to the sharp decline in performance on previously learned tasks after learning a new one. Specifically, when data scarcity is the issue, the model shows a rapid drop in performance on previously learned instruments after learning new data with new instruments. The problem becomes worse when it limits releasing the dataset of the old instruments for the old model due to privacy concerns and the unavailability of the data for the new or updated version of the instruments for the continual learning model. For this purpose, we develop a privacy-preserving synthetic continual semantic segmentation framework by blending and harmonizing (i) open-source old instruments foreground to the synthesized background without revealing real patient data in public and (ii) new instruments foreground to extensively augmented real background. To boost the balanced logit distillation from the old model to the continual learning model, we design overlapping class-aware temperature normalization (CAT) by controlling model learning utility. We also introduce multi-scale shifted-feature distillation (SD) to maintain long and short-range spatial relationships among the semantic objects where conventional short-range spatial features with limited information reduce the power of feature distillation. We demonstrate the effectiveness of our framework on the EndoVis 2017 and 2018 instrument segmentation dataset with a generalized continual learning setting. Code is available at https://github.com/XuMengyaAmy/Synthetic_CAT_SD

UCL Discovery

Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis

Author: Islam Mobarakol
Ren Hongliang
Wang An
Xu Mengya
Publication venue
Publication date: 28/06/2023
Field of study

Despite their impressive performance in various surgical scene understanding tasks, deep learning-based methods are frequently hindered from deploying to real-world surgical applications for various causes. Particularly, data collection, annotation, and domain shift in-between sites and patients are the most common obstacles. In this work, we mitigate data-related issues by efficiently leveraging minimal source images to generate synthetic surgical instrument segmentation datasets and achieve outstanding generalization performance on unseen real domains. Specifically, in our framework, only one background tissue image and at most three images of each foreground instrument are taken as the seed images. These source images are extensively transformed and employed to build up the foreground and background image pools, from which randomly sampled tissue and instrument images are composed with multiple blending techniques to generate new surgical scene images. Besides, we introduce hybrid training-time augmentations to diversify the training data further. Extensive evaluation on three real-world datasets, i.e., Endo2017, Endo2018, and RoboTool, demonstrates that our one-to-many synthetic surgical instruments datasets generation and segmentation framework can achieve encouraging performance compared with training with real data. Notably, on the RoboTool dataset, where a more significant domain gap exists, our framework shows its superiority of generalization by a considerable margin. We expect that our inspiring results will attract research attention to improving model generalization with data synthesizing.Comment: First two authors contributed equally. Accepted by IROS202

arXiv.org e-Print Archive

The Quantitative Diagnosis Method of Rubbing Rotor System

Author: Bangchun Wen
Hongliang Yao
Qi Xu
Zhaohui Ren
Publication venue: Vibroengineering
Publication date: 20/11/2013
Field of study

The dynamics of the rubbing rotor system is analyzed by applying harmonic balance method. The relationship between harmonic components in the response of the rubbing rotor system and the dynamic stiffness matrix of the fault free rotor system is revealed, based on which a new model based method for rubbing identification is presented. By applying this method, the fault location and rubbing forces of the single rubbing rotor system can be identified by using vibration data of only two nodes, the rubbing locations and rubbing forces of the double rubbing rotor system can be identified by using vibration data of three nodes. The numerical simulations and experiments on the rotor test-rig are carried out to verify the efficiency of the present method