32 research outputs found

    Robotic Surgery Remote Mentoring via AR with 3D Scene Streaming and Hand Interaction

    Full text link
    With the growing popularity of robotic surgery, education becomes increasingly important and urgently needed for the sake of patient safety. However, experienced surgeons have limited accessibility due to their busy clinical schedule or working in a distant city, thus can hardly provide sufficient education resources for novices. Remote mentoring, as an effective way, can help solve this problem, but traditional methods are limited to plain text, audio, or 2D video, which are not intuitive nor vivid. Augmented reality (AR), a thriving technique being widely used for various education scenarios, is promising to offer new possibilities of visual experience and interactive teaching. In this paper, we propose a novel AR-based robotic surgery remote mentoring system with efficient 3D scene visualization and natural 3D hand interaction. Using a head-mounted display (i.e., HoloLens), the mentor can remotely monitor the procedure streamed from the trainee's operation side. The mentor can also provide feedback directly with hand gestures, which is in-turn transmitted to the trainee and viewed in surgical console as guidance. We comprehensively validate the system on both real surgery stereo videos and ex-vivo scenarios of common robotic training tasks (i.e., peg-transfer and suturing). Promising results are demonstrated regarding the fidelity of streamed scene visualization, the accuracy of feedback with hand interaction, and the low-latency of each component in the entire remote mentoring system. This work showcases the feasibility of leveraging AR technology for reliable, flexible and low-cost solutions to robotic surgical education, and holds great potential for clinical applications

    Trans-SVNet: Accurate Phase Recognition from Surgical Videos via Hybrid Embedding Aggregation Transformer

    Full text link
    Real-time surgical phase recognition is a fundamental task in modern operating rooms. Previous works tackle this task relying on architectures arranged in spatio-temporal order, however, the supportive benefits of intermediate spatial features are not considered. In this paper, we introduce, for the first time in surgical workflow analysis, Transformer to reconsider the ignored complementary effects of spatial and temporal features for accurate surgical phase recognition. Our hybrid embedding aggregation Transformer fuses cleverly designed spatial and temporal embeddings by allowing for active queries based on spatial information from temporal embedding sequences. More importantly, our framework processes the hybrid embeddings in parallel to achieve a high inference speed. Our method is thoroughly validated on two large surgical video datasets, i.e., Cholec80 and M2CAI16 Challenge datasets, and outperforms the state-of-the-art approaches at a processing speed of 91 fps.Comment: MICCAI202

    Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip Segmentation in Robotic Surgeries

    Full text link
    Accurate segmentation of surgical instrument tip is an important task for enabling downstream applications in robotic surgery, such as surgical skill assessment, tool-tissue interaction and deformation modeling, as well as surgical autonomy. However, this task is very challenging due to the small sizes of surgical instrument tips, and significant variance of surgical scenes across different procedures. Although much effort has been made on visual-based methods, existing segmentation models still suffer from low robustness thus not usable in practice. Fortunately, kinematics data from the robotic system can provide reliable prior for instrument location, which is consistent regardless of different surgery types. To make use of such multi-modal information, we propose a novel visual-kinematics graph learning framework to accurately segment the instrument tip given various surgical procedures. Specifically, a graph learning framework is proposed to encode relational features of instrument parts from both image and kinematics. Next, a cross-modal contrastive loss is designed to incorporate robust geometric prior from kinematics to image for tip segmentation. We have conducted experiments on a private paired visual-kinematics dataset including multiple procedures, i.e., prostatectomy, total mesorectal excision, fundoplication and distal gastrectomy on cadaver, and distal gastrectomy on porcine. The leave-one-procedure-out cross validation demonstrated that our proposed multi-modal segmentation method significantly outperformed current image-based state-of-the-art approaches, exceeding averagely 11.2% on Dice.Comment: Accepted to IROS 202

    Temporal Memory Relation Network for Workflow Recognition from Surgical Video

    Full text link
    Automatic surgical workflow recognition is a key component for developing context-aware computer-assisted systems in the operating theatre. Previous works either jointly modeled the spatial features with short fixed-range temporal information, or separately learned visual and long temporal cues. In this paper, we propose a novel end-to-end temporal memory relation network (TMRNet) for relating long-range and multi-scale temporal patterns to augment the present features. We establish a long-range memory bank to serve as a memory cell storing the rich supportive information. Through our designed temporal variation layer, the supportive cues are further enhanced by multi-scale temporal-only convolutions. To effectively incorporate the two types of cues without disturbing the joint learning of spatio-temporal features, we introduce a non-local bank operator to attentively relate the past to the present. In this regard, our TMRNet enables the current feature to view the long-range temporal dependency, as well as tolerate complex temporal extents. We have extensively validated our approach on two benchmark surgical video datasets, M2CAI challenge dataset and Cholec80 dataset. Experimental results demonstrate the outstanding performance of our method, consistently exceeding the state-of-the-art methods by a large margin (e.g., 67.0% v.s. 78.9% Jaccard on Cholec80 dataset).Comment: Accepted at IEEE Transactions on Medical Imaging (IEEE TMI); Code is available at https://github.com/YuemingJin/TMRNe

    Edge states at nematic domain walls in FeSe films

    Full text link
    Quantum spin Hall (QSH) effect is an intriguing phenomenon arising from the helical edge states in two-dimensional topological insulators. We use molecular beam epitaxy (MBE) to prepare FeSe films with atomically sharp nematic domain boundaries, where tensile strains, nematicity suppression and topological band inversion are simultaneously achieved. Using scanning tunneling microscopy (STM), we observe edge states at the Fermi level that spatially distribute as two distinct strips in the vicinity of the domain boundaries. At the endpoint of the boundaries, a bound state at the Fermi level is further observed. The topological origin of the edge states is supported by density functional theory calculations. Our findings not only demonstrate a candidate for QSH states, but also provide a new pathway to realize topological superconductivity in a single-component film

    Distilled Visual and Robot Kinematics Embeddings for Metric Depth Estimation in Monocular Scene Reconstruction

    Full text link
    Estimating precise metric depth and scene reconstruction from monocular endoscopy is a fundamental task for surgical navigation in robotic surgery. However, traditional stereo matching adopts binocular images to perceive the depth information, which is difficult to transfer to the soft robotics-based surgical systems due to the use of monocular endoscopy. In this paper, we present a novel framework that combines robot kinematics and monocular endoscope images with deep unsupervised learning into a single network for metric depth estimation and then achieve 3D reconstruction of complex anatomy. Specifically, we first obtain the relative depth maps of surgical scenes by leveraging a brightness-aware monocular depth estimation method. Then, the corresponding endoscope poses are computed based on non-linear optimization of geometric and photometric reprojection residuals. Afterwards, we develop a Depth-driven Sliding Optimization (DDSO) algorithm to extract the scaling coefficient from kinematics and calculated poses offline. By coupling the metric scale and relative depth data, we form a robust ensemble that represents the metric and consistent depth. Next, we treat the ensemble as supervisory labels to train a metric depth estimation network for surgeries (i.e., MetricDepthS-Net) that distills the embeddings from the robot kinematics, endoscopic videos, and poses. With accurate metric depth estimation, we utilize a dense visual reconstruction method to recover the 3D structure of the whole surgical site. We have extensively evaluated the proposed framework on public SCARED and achieved comparable performance with stereo-based depth estimation methods. Our results demonstrate the feasibility of the proposed approach to recover the metric depth and 3D structure with monocular inputs

    Stereo Dense Scene Reconstruction and Accurate Localization for Learning-Based Navigation of Laparoscope in Minimally Invasive Surgery

    Full text link
    Objective: The computation of anatomical information and laparoscope position is a fundamental block of surgical navigation in Minimally Invasive Surgery (MIS). Recovering a dense 3D structure of surgical scene using visual cues remains a challenge, and the online laparoscopic tracking primarily relies on external sensors, which increases system complexity. Methods: Here, we propose a learning-driven framework, in which an image-guided laparoscopic localization with 3D reconstructions of complex anatomical structures is obtained. To reconstruct the 3D structure of the whole surgical environment, we first fine-tune a learning-based stereoscopic depth perception method, which is robust to the texture-less and variant soft tissues, for depth estimation. Then, we develop a dense visual reconstruction algorithm to represent the scene by surfels, estimate the laparoscope poses and fuse the depth maps into a unified reference coordinate for tissue reconstruction. To estimate poses of new laparoscope views, we achieve a coarse-to-fine localization method, which incorporates our reconstructed 3D model. Results: We evaluate the reconstruction method and the localization module on three datasets, namely, the stereo correspondence and reconstruction of endoscopic data (SCARED), the ex-vivo phantom and tissue data collected with Universal Robot (UR) and Karl Storz Laparoscope, and the in-vivo DaVinci robotic surgery dataset, where the reconstructed 3D structures have rich details of surface texture with an accuracy error under 1.71 mm and the localization module can accurately track the laparoscope with only images as input. Conclusions: Experimental results demonstrate the superior performance of the proposed method in 3D anatomy reconstruction and laparoscopic localization. Significance: The proposed framework can be potentially extended to the current surgical navigation system

    Electronic states and magnetic response of MnBi2Te4 by scanning tunneling microscopy and spectroscopy

    Full text link
    Exotic quantum phenomena have been demonstrated in recently discovered intrinsic magnetic topological insulator MnBi2Te4. At its two-dimensional limit, quantum anomalous Hall (QAH) effect and axion insulator state are observed in odd and even layers of MnBi2Te4, respectively. The measured band structures exhibit intriguing and complex properties. Here we employ low-temperature scanning tunneling microscopy to study its surface states and magnetic response. The quasiparticle interference patterns indicate that the electronic structures on the topmost layer of MnBi2Te4 is different from that of the expected out-of-plane A-type antiferromagnetic phase. The topological surface states may be embedded in deeper layers beneath the topmost surface. Such novel electronic structure presumably related to the modification of crystalline structure during sample cleaving and re-orientation of magnetic moment of Mn atoms near the surface. Mn dopants substituted at the Bi site on the second atomic layer are observed. The ratio of Mn/Bi substitutions is 5%. The electronic structures are fluctuating at atomic scale on the surface, which can affect the magnetism of MnBi2Te4. Our findings shed new lights on the magnetic property of MnBi2Te4 and thus the design of magnetic topological insulators

    The Outcome of the 2022 Landslide4Sense Competition: Advanced Landslide Detection from Multi-Source Satellite Imagery

    Get PDF
    The scientific outcomes of the 2022 Landslide4Sense (L4S) competition organized by the Institute of Advanced Research in Artificial Intelligence (IARAI) are presented here. The objective of the competition is to automatically detect landslides based on large-scale multiple sources of satellite imagery collected globally. The 2022 L4S aims to foster interdisciplinary research on recent developments in deep learning (DL) models for the semantic segmentation task using satellite imagery. In the past few years, DL-based models have achieved performance that meets expectations on image interpretation, due to the development of convolutional neural networks (CNNs). The main objective of this article is to present the details and the best-performing algorithms featured in this competition. The winning solutions are elaborated with state-of-the-art models like the Swin Transformer, SegFormer, and U-Net. Advanced machine learning techniques and strategies such as hard example mining, self-training, and mix-up data augmentation are also considered. Moreover, we describe the L4S benchmark data set in order to facilitate further comparisons, and report the results of the accuracy assessment online. The data is accessible on Future Development Leaderboard for future evaluation at https://www.iarai.ac.at/landslide4sense/challenge/ , and researchers are invited to submit more prediction results, evaluate the accuracy of their methods, compare them with those of other users, and, ideally, improve the landslide detection results reported in this article

    Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the HeiChole benchmark

    Get PDF
    Purpose: Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems. These systems could increase the safety of the operation through context-sensitive warnings and semi-autonomous robotic assistance or improve training of surgeons via data-driven feedback. In surgical workflow analysis up to 91% average precision has been reported for phase recognition on an open data single-center video dataset. In this work we investigated the generalizability of phase recognition algorithms in a multicenter setting including more difficult recognition tasks such as surgical action and surgical skill. Methods: To achieve this goal, a dataset with 33 laparoscopic cholecystectomy videos from three surgical centers with a total operation time of 22 h was created. Labels included framewise annotation of seven surgical phases with 250 phase transitions, 5514 occurences of four surgical actions, 6980 occurences of 21 surgical instruments from seven instrument categories and 495 skill classifications in five skill dimensions. The dataset was used in the 2019 international Endoscopic Vision challenge, sub-challenge for surgical workflow and skill analysis. Here, 12 research teams trained and submitted their machine learning algorithms for recognition of phase, action, instrument and/or skill assessment. Results: F1-scores were achieved for phase recognition between 23.9% and 67.7% (n = 9 teams), for instrument presence detection between 38.5% and 63.8% (n = 8 teams), but for action recognition only between 21.8% and 23.3% (n = 5 teams). The average absolute error for skill assessment was 0.78 (n = 1 team). Conclusion: Surgical workflow and skill analysis are promising technologies to support the surgical team, but there is still room for improvement, as shown by our comparison of machine learning algorithms. This novel HeiChole benchmark can be used for comparable evaluation and validation of future work. In future studies, it is of utmost importance to create more open, high-quality datasets in order to allow the development of artificial intelligence and cognitive robotics in surgery
    corecore