1,480 research outputs found
Deep Reinforcement Learning in Surgical Robotics: Enhancing the Automation Level
Surgical robotics is a rapidly evolving field that is transforming the
landscape of surgeries. Surgical robots have been shown to enhance precision,
minimize invasiveness, and alleviate surgeon fatigue. One promising area of
research in surgical robotics is the use of reinforcement learning to enhance
the automation level. Reinforcement learning is a type of machine learning that
involves training an agent to make decisions based on rewards and punishments.
This literature review aims to comprehensively analyze existing research on
reinforcement learning in surgical robotics. The review identified various
applications of reinforcement learning in surgical robotics, including
pre-operative, intra-body, and percutaneous procedures, listed the typical
studies, and compared their methodologies and results. The findings show that
reinforcement learning has great potential to improve the autonomy of surgical
robots. Reinforcement learning can teach robots to perform complex surgical
tasks, such as suturing and tissue manipulation. It can also improve the
accuracy and precision of surgical robots, making them more effective at
performing surgeries
Sensor Fusion of Leap Motion Controller and Flex Sensors using Kalman Filter for Human Finger Tracking
In our daily life, we, human beings use our hands in various ways for most of our day-to-day activities. Tracking the position, orientation and articulation of human hands has a variety of applications including gesture recognition, robotics, medicine and health care, design and manufacturing, art and entertainment across multiple domains. However, it is an equally complex and challenging task due to several factors like higher dimensional data from hand motion, higher speed of operation, self-occlusion, etc. This paper puts forth a novel method for tracking the finger tips of human hand using two distinct sensors and combining their data by sensor fusion technique
Revisiting Distillation for Continual Learning on Visual Question Localized-Answering in Robotic Surgery
The visual-question localized-answering (VQLA) system can serve as a
knowledgeable assistant in surgical education. Except for providing text-based
answers, the VQLA system can highlight the interested region for better
surgical scene understanding. However, deep neural networks (DNNs) suffer from
catastrophic forgetting when learning new knowledge. Specifically, when DNNs
learn on incremental classes or tasks, their performance on old tasks drops
dramatically. Furthermore, due to medical data privacy and licensing issues, it
is often difficult to access old data when updating continual learning (CL)
models. Therefore, we develop a non-exemplar continual surgical VQLA framework,
to explore and balance the rigidity-plasticity trade-off of DNNs in a
sequential learning paradigm. We revisit the distillation loss in CL tasks, and
propose rigidity-plasticity-aware distillation (RP-Dist) and self-calibrated
heterogeneous distillation (SH-Dist) to preserve the old knowledge. The weight
aligning (WA) technique is also integrated to adjust the weight bias between
old and new tasks. We further establish a CL framework on three public surgical
datasets in the context of surgical settings that consist of overlapping
classes between old and new surgical VQLA tasks. With extensive experiments, we
demonstrate that our proposed method excellently reconciles learning and
forgetting on the continual surgical VQLA over conventional CL methods. Our
code is publicly accessible.Comment: To appear in MICCAI 2023. Code availability:
https://github.com/longbai1006/CS-VQL
Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery
Medical students and junior surgeons often rely on senior surgeons and
specialists to answer their questions when learning surgery. However, experts
are often busy with clinical and academic work, and have little time to give
guidance. Meanwhile, existing deep learning (DL)-based surgical Visual Question
Answering (VQA) systems can only provide simple answers without the location of
the answers. In addition, vision-language (ViL) embedding is still a less
explored research in these kinds of tasks. Therefore, a surgical Visual
Question Localized-Answering (VQLA) system would be helpful for medical
students and junior surgeons to learn and understand from recorded surgical
videos. We propose an end-to-end Transformer with Co-Attention gaTed
Vision-Language (CAT-ViL) for VQLA in surgical scenarios, which does not
require feature extraction through detection models. The CAT-ViL embedding
module is designed to fuse heterogeneous features from visual and textual
sources. The fused embedding will feed a standard Data-Efficient Image
Transformer (DeiT) module, before the parallel classifier and detector for
joint prediction. We conduct the experimental validation on public surgical
videos from MICCAI EndoVis Challenge 2017 and 2018. The experimental results
highlight the superior performance and robustness of our proposed model
compared to the state-of-the-art approaches. Ablation studies further prove the
outstanding performance of all the proposed components. The proposed method
provides a promising solution for surgical scene understanding, and opens up a
primary step in the Artificial Intelligence (AI)-based VQLA system for surgical
training. Our code is publicly available.Comment: To appear in MICCAI 2023. Code availability:
https://github.com/longbai1006/CAT-Vi
Head Pose Estimation and 3D Neural Surface Reconstruction via Monocular Camera in situ for Navigation and Safe Insertion into Natural Openings
As the significance of simulation in medical care and intervention continues
to grow, it is anticipated that a simplified and low-cost platform can be set
up to execute personalized diagnoses and treatments. 3D Slicer can not only
perform medical image analysis and visualization but can also provide surgical
navigation and surgical planning functions. In this paper, we have chosen 3D
Slicer as our base platform and monocular cameras are used as sensors. Then, We
used the neural radiance fields (NeRF) algorithm to complete the 3D model
reconstruction of the human head. We compared the accuracy of the NeRF
algorithm in generating 3D human head scenes and utilized the MarchingCube
algorithm to generate corresponding 3D mesh models. The individual's head pose,
obtained through single-camera vision, is transmitted in real-time to the scene
created within 3D Slicer. The demonstrations presented in this paper include
real-time synchronization of transformations between the human head model in
the 3D Slicer scene and the detected head posture. Additionally, we tested a
scene where a tool, marked with an ArUco Maker tracked by a single camera,
synchronously points to the real-time transformation of the head posture. These
demos indicate that our methodology can provide a feasible real-time simulation
platform for nasopharyngeal swab collection or intubation.Comment: Accepted by ICBIR 202
Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs
Video-assisted transoral tracheal intubation (TI) necessitates using an
endoscope that helps the physician insert a tracheal tube into the glottis
instead of the esophagus. The growing trend of robotic-assisted TI would
require a medical robot to distinguish anatomical features like an experienced
physician which can be imitated by utilizing supervised deep-learning
techniques. However, the real datasets of oropharyngeal organs are often
inaccessible due to limited open-source data and patient privacy. In this work,
we propose a domain adaptive Sim-to-Real framework called IoU-Ranking
Blend-ArtFlow (IRB-AF) for image segmentation of oropharyngeal organs. The
framework includes an image blending strategy called IoU-Ranking Blend (IRB)
and style-transfer method ArtFlow. Here, IRB alleviates the problem of poor
segmentation performance caused by significant datasets domain differences;
while ArtFlow is introduced to reduce the discrepancies between datasets
further. A virtual oropharynx image dataset generated by the SOFA framework is
used as the learning subject for semantic segmentation to deal with the limited
availability of actual endoscopic images. We adapted IRB-AF with the
state-of-the-art domain adaptive segmentation models. The results demonstrate
the superior performance of our approach in further improving the segmentation
accuracy and training stability.Comment: The manuscript is accepted by Medical & Biological Engineering &
Computing. Code and dataset:
https://github.com/gkw0010/EISOST-Sim2Real-Dataset-Releas
- …
