13 research outputs found
Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery
Understanding and anticipating intraoperative events and actions is critical
for intraoperative assistance and decision-making during minimally invasive
surgery. Automated prediction of events, actions, and the following
consequences is addressed through various computational approaches with the
objective of augmenting surgeons' perception and decision-making capabilities.
We propose a predictive neural network that is capable of understanding and
predicting critical interactive aspects of surgical workflow from
intra-abdominal video, while flexibly leveraging surgical knowledge graphs. The
approach incorporates a hypergraph-transformer (HGT) structure that encodes
expert knowledge into the network design and predicts the hidden embedding of
the graph. We verify our approach on established surgical datasets and
applications, including the detection and prediction of action triplets, and
the achievement of the Critical View of Safety (CVS). Moreover, we address
specific, safety-related tasks, such as predicting the clipping of cystic duct
or artery without prior achievement of the CVS. Our results demonstrate the
superiority of our approach compared to unstructured alternatives
Image Compositing for Segmentation of Surgical Tools Without Manual Annotations
Producing manual, pixel-accurate, image segmentation labels is tedious and time-consuming. This is often a rate-limiting factor when large amounts of labeled images are required, such as for training deep convolutional networks for instrument-background segmentation in surgical scenes. No large datasets comparable to industry standards in the computer vision community are available for this task. To circumvent this problem, we propose to automate the creation of a realistic training dataset by exploiting techniques stemming from special effects and harnessing them to target training performance rather than visual appeal. Foreground data is captured by placing sample surgical instruments over a chroma key (a.k.a. green screen) in a controlled environment, thereby making extraction of the relevant image segment straightforward. Multiple lighting conditions and viewpoints can be captured and introduced in the simulation by moving the instruments and camera and modulating the light source. Background data is captured by collecting videos that do not contain instruments. In the absence of pre-existing instrument-free background videos, minimal labeling effort is required, just to select frames that do not contain surgical instruments from videos of surgical interventions freely available online. We compare different methods to blend instruments over tissue and propose a novel data augmentation approach that takes advantage of the plurality of options. We show that by training a vanilla U-Net on semi-synthetic data only and applying a simple post-processing, we are able to match the results of the same network trained on a publicly available manually labeled real dataset
Image Compositing for Segmentation of Surgical Tools Without Manual Annotations
Producing manual, pixel-accurate, image segmentation labels is tedious and time-consuming. This is often a rate-limiting factor when large amounts of labeled images are required, such as for training deep convolutional networks for instrument-background segmentation in surgical scenes. No large datasets comparable to industry standards in the computer vision community are available for this task. To circumvent this problem, we propose to automate the creation of a realistic training dataset by exploiting techniques stemming from special effects and harnessing them to target training performance rather than visual appeal. Foreground data is captured by placing sample surgical instruments over a chroma key (a.k.a. green screen) in a controlled environment, thereby making extraction of the relevant image segment straightforward. Multiple lighting conditions and viewpoints can be captured and introduced in the simulation by moving the instruments and camera and modulating the light source. Background data is captured by collecting videos that do not contain instruments. In the absence of pre-existing instrument-free background videos, minimal labeling effort is required, just to select frames that do not contain surgical instruments from videos of surgical interventions freely available online. We compare different methods to blend instruments over tissue and propose a novel data augmentation approach that takes advantage of the plurality of options. We show that by training a vanilla U-Net on semi-synthetic data only and applying a simple post-processing, we are able to match the results of the same network trained on a publicly available manually labeled real dataset
SAF-IS: a Spatial Annotation Free Framework for Instance Segmentation of Surgical Tools
Instance segmentation of surgical instruments is a long-standing research
problem, crucial for the development of many applications for computer-assisted
surgery. This problem is commonly tackled via fully-supervised training of deep
learning models, requiring expensive pixel-level annotations to train. In this
work, we develop a framework for instance segmentation not relying on spatial
annotations for training. Instead, our solution only requires binary tool
masks, obtainable using recent unsupervised approaches, and binary tool
presence labels, freely obtainable in robot-assisted surgery. Based on the
binary mask information, our solution learns to extract individual tool
instances from single frames, and to encode each instance into a compact vector
representation, capturing its semantic features. Such representations guide the
automatic selection of a tiny number of instances (8 only in our experiments),
displayed to a human operator for tool-type labelling. The gathered information
is finally used to match each training instance with a binary tool presence
label, providing an effective supervision signal to train a tool instance
classifier. We validate our framework on the EndoVis 2017 and 2018 segmentation
datasets. We provide results using binary masks obtained either by manual
annotation or as predictions of an unsupervised binary segmentation model. The
latter solution yields an instance segmentation approach completely free from
spatial annotations, outperforming several state-of-the-art fully-supervised
segmentation approaches
Combining Differential Kinematics and Optical Flow for Automatic Labeling of Continuum Robots in Minimally Invasive Surgery
International audienceThe segmentation of continuum robots in medical images can be of interest for analyzing surgical procedures or for controlling them. However, the automatic segmentation of continuous and flexible shapes is not an easy task. On one hand conventional approaches are not adapted to the specificities of these instruments, such as imprecise kinematic models, and on the other hand techniques based on deep-learning showed interesting capabilities but need many manually labeled images. In this article we propose a novel approach for segmenting continuum robots on endoscopic images, which requires no prior on the instrument visual appearance and no manual annotation of images. The method relies on the use of the combination of kinematic models and differential kinematic models of the robot and the analysis of optical flow in the images. A cost function aggregating information from the acquired image, from optical flow and from robot encoders is optimized using particle swarm optimization and provides estimated parameters of the pose of the continuum instrument and a mask defining the instrument in the image. In addition a temporal consistency is assessed in order to improve stochastic optimization and reject outliers. The proposed approach has been tested for the robotic instruments of a flexible endoscopy platform both for benchtop acquisitions and an in vivo video. The results show the ability of the technique to correctly segment the instruments without a prior, and in challenging conditions. The obtained segmentation can be used for several applications, for instance for providing automatic labels for machine learning techniques