1,099 research outputs found
Generating large labeled data sets for laparoscopic image processing tasks using unpaired image-to-image translation
In the medical domain, the lack of large training data sets and benchmarks is
often a limiting factor for training deep neural networks. In contrast to
expensive manual labeling, computer simulations can generate large and fully
labeled data sets with a minimum of manual effort. However, models that are
trained on simulated data usually do not translate well to real scenarios. To
bridge the domain gap between simulated and real laparoscopic images, we
exploit recent advances in unpaired image-to-image translation. We extent an
image-to-image translation method to generate a diverse multitude of
realistically looking synthetic images based on images from a simple
laparoscopy simulation. By incorporating means to ensure that the image content
is preserved during the translation process, we ensure that the labels given
for the simulated images remain valid for their realistically looking
translations. This way, we are able to generate a large, fully labeled
synthetic data set of laparoscopic images with realistic appearance. We show
that this data set can be used to train models for the task of liver
segmentation of laparoscopic images. We achieve average dice scores of up to
0.89 in some patients without manually labeling a single laparoscopic image and
show that using our synthetic data to pre-train models can greatly improve
their performance. The synthetic data set will be made publicly available,
fully labeled with segmentation maps, depth maps, normal maps, and positions of
tools and camera (http://opencas.dkfz.de/image2image).Comment: Accepted at MICCAI 201
Exploring Semantic Consistency in Unpaired Image Translation to Generate Data for Surgical Applications
In surgical computer vision applications, obtaining labeled training data is
challenging due to data-privacy concerns and the need for expert annotation.
Unpaired image-to-image translation techniques have been explored to
automatically generate large annotated datasets by translating synthetic images
to the realistic domain. However, preserving the structure and semantic
consistency between the input and translated images presents significant
challenges, mainly when there is a distributional mismatch in the semantic
characteristics of the domains. This study empirically investigates unpaired
image translation methods for generating suitable data in surgical
applications, explicitly focusing on semantic consistency. We extensively
evaluate various state-of-the-art image translation models on two challenging
surgical datasets and downstream semantic segmentation tasks. We find that a
simple combination of structural-similarity loss and contrastive learning
yields the most promising results. Quantitatively, we show that the data
generated with this approach yields higher semantic consistency and can be used
more effectively as training data
Simulation-to-Real domain adaptation with teacher-student learning for endoscopic instrument segmentation
Purpose: Segmentation of surgical instruments in endoscopic videos is
essential for automated surgical scene understanding and process modeling.
However, relying on fully supervised deep learning for this task is challenging
because manual annotation occupies valuable time of the clinical experts.
Methods: We introduce a teacher-student learning approach that learns jointly
from annotated simulation data and unlabeled real data to tackle the erroneous
learning problem of the current consistency-based unsupervised domain
adaptation framework.
Results: Empirical results on three datasets highlight the effectiveness of
the proposed framework over current approaches for the endoscopic instrument
segmentation task. Additionally, we provide analysis of major factors affecting
the performance on all datasets to highlight the strengths and failure modes of
our approach.
Conclusion: We show that our proposed approach can successfully exploit the
unlabeled real endoscopic video frames and improve generalization performance
over pure simulation-based training and the previous state-of-the-art. This
takes us one step closer to effective segmentation of surgical tools in the
annotation scarce setting.Comment: Accepted at IPCAI202
XCAT-GAN for Synthesizing 3D Consistent Labeled Cardiac MR Images on Anatomically Variable XCAT Phantoms
Generative adversarial networks (GANs) have provided promising data
enrichment solutions by synthesizing high-fidelity images. However, generating
large sets of labeled images with new anatomical variations remains unexplored.
We propose a novel method for synthesizing cardiac magnetic resonance (CMR)
images on a population of virtual subjects with a large anatomical variation,
introduced using the 4D eXtended Cardiac and Torso (XCAT) computerized human
phantom. We investigate two conditional image synthesis approaches grounded on
a semantically-consistent mask-guided image generation technique: 4-class and
8-class XCAT-GANs. The 4-class technique relies on only the annotations of the
heart; while the 8-class technique employs a predicted multi-tissue label map
of the heart-surrounding organs and provides better guidance for our
conditional image synthesis. For both techniques, we train our conditional
XCAT-GAN with real images paired with corresponding labels and subsequently at
the inference time, we substitute the labels with the XCAT derived ones.
Therefore, the trained network accurately transfers the tissue-specific
textures to the new label maps. By creating 33 virtual subjects of synthetic
CMR images at the end-diastolic and end-systolic phases, we evaluate the
usefulness of such data in the downstream cardiac cavity segmentation task
under different augmentation strategies. Results demonstrate that even with
only 20% of real images (40 volumes) seen during training, segmentation
performance is retained with the addition of synthetic CMR images. Moreover,
the improvement in utilizing synthetic images for augmenting the real data is
evident through the reduction of Hausdorff distance up to 28% and an increase
in the Dice score up to 5%, indicating a higher similarity to the ground truth
in all dimensions.Comment: Accepted for MICCAI 202
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Recent advancements in surgical computer vision applications have been driven
by fully-supervised methods, primarily using only visual data. These methods
rely on manually annotated surgical videos to predict a fixed set of object
categories, limiting their generalizability to unseen surgical procedures and
downstream tasks. In this work, we put forward the idea that the surgical video
lectures available through open surgical e-learning platforms can provide
effective supervisory signals for multi-modal representation learning without
relying on manual annotations. We address the surgery-specific linguistic
challenges present in surgical video lectures by employing multiple
complementary automatic speech recognition systems to generate text
transcriptions. We then present a novel method, SurgVLP - Surgical Vision
Language Pre-training, for multi-modal representation learning. SurgVLP
constructs a new contrastive learning objective to align video clip embeddings
with the corresponding multiple text embeddings by bringing them together
within a joint latent space. To effectively show the representation capability
of the learned joint latent space, we introduce several vision-and-language
tasks for surgery, such as text-based video retrieval, temporal activity
grounding, and video captioning, as benchmarks for evaluation. We further
demonstrate that without using any labeled ground truth, our approach can be
employed for traditional vision-only surgical downstream tasks, such as
surgical tool, phase, and triplet recognition. The code will be made available
at https://github.com/CAMMA-public/SurgVL
SSIS-Seg: Simulation-supervised image synthesis for surgical instrument segmentation
Surgical instrument segmentation can be used in a range of computer assisted interventions and automation in surgical robotics. While deep learning architectures have rapidly advanced the robustness and performance of segmentation models, most are still reliant on supervision and large quantities of labelled data. In this paper, we present a novel method for surgical image generation that can fuse robotic instrument simulation and recent domain adaptation techniques to synthesize artificial surgical images to train surgical instrument segmentation models. We integrate attention modules into well established image generation pipelines and propose a novel cost function to support supervision from simulation frames in model training. We provide an extensive evaluation of our method in terms of segmentation performance along with a validation study on image quality using evaluation metrics. Additionally, we release a novel segmentation dataset from real surgeries that will be shared for research purposes. Both binary and semantic segmentation have been considered, and we show the capability of our synthetic images to train segmentation models compared with the latest methods from the literature
A comprehensive survey on recent deep learning-based methods applied to surgical data
Minimally invasive surgery is highly operator dependant with a lengthy
procedural time causing fatigue to surgeon and risks to patients such as injury
to organs, infection, bleeding, and complications of anesthesia. To mitigate
such risks, real-time systems are desired to be developed that can provide
intra-operative guidance to surgeons. For example, an automated system for tool
localization, tool (or tissue) tracking, and depth estimation can enable a
clear understanding of surgical scenes preventing miscalculations during
surgical procedures. In this work, we present a systematic review of recent
machine learning-based approaches including surgical tool localization,
segmentation, tracking, and 3D scene perception. Furthermore, we provide a
detailed overview of publicly available benchmark datasets widely used for
surgical navigation tasks. While recent deep learning architectures have shown
promising results, there are still several open research problems such as a
lack of annotated datasets, the presence of artifacts in surgical scenes, and
non-textured surfaces that hinder 3D reconstruction of the anatomical
structures. Based on our comprehensive review, we present a discussion on
current gaps and needed steps to improve the adaptation of technology in
surgery.Comment: This paper is to be submitted to International journal of computer
visio
- …