66 research outputs found
A deep learning framework for quality assessment and restoration in video endoscopy
Endoscopy is a routine imaging technique used for both diagnosis and
minimally invasive surgical treatment. Artifacts such as motion blur, bubbles,
specular reflections, floating objects and pixel saturation impede the visual
interpretation and the automated analysis of endoscopy videos. Given the
widespread use of endoscopy in different clinical applications, we contend that
the robust and reliable identification of such artifacts and the automated
restoration of corrupted video frames is a fundamental medical imaging problem.
Existing state-of-the-art methods only deal with the detection and restoration
of selected artifacts. However, typically endoscopy videos contain numerous
artifacts which motivates to establish a comprehensive solution.
We propose a fully automatic framework that can: 1) detect and classify six
different primary artifacts, 2) provide a quality score for each frame and 3)
restore mildly corrupted frames. To detect different artifacts our framework
exploits fast multi-scale, single stage convolutional neural network detector.
We introduce a quality metric to assess frame quality and predict image
restoration success. Generative adversarial networks with carefully chosen
regularization are finally used to restore corrupted frames.
Our detector yields the highest mean average precision (mAP at 5% threshold)
of 49.0 and the lowest computational time of 88 ms allowing for accurate
real-time processing. Our restoration models for blind deblurring, saturation
correction and inpainting demonstrate significant improvements over previous
methods. On a set of 10 test videos we show that our approach preserves an
average of 68.7% which is 25% more frames than that retained from the raw
videos.Comment: 14 page
Estimating Reflectance Layer from A Single Image: Integrating Reflectance Guidance and Shadow/Specular Aware Learning
Estimating reflectance layer from a single image is a challenging task. It
becomes more challenging when the input image contains shadows or specular
highlights, which often render an inaccurate estimate of the reflectance layer.
Therefore, we propose a two-stage learning method, including reflectance
guidance and a Shadow/Specular-Aware (S-Aware) network to tackle the problem.
In the first stage, an initial reflectance layer free from shadows and
specularities is obtained with the constraint of novel losses that are guided
by prior-based shadow-free and specular-free images. To further enforce the
reflectance layer to be independent from shadows and specularities in the
second-stage refinement, we introduce an S-Aware network that distinguishes the
reflectance image from the input image. Our network employs a classifier to
categorize shadow/shadow-free, specular/specular-free classes, enabling the
activation features to function as attention maps that focus on shadow/specular
regions. Our quantitative and qualitative evaluations show that our method
outperforms the state-of-the-art methods in the reflectance layer estimation
that is free from shadows and specularities.Comment: Accepted to AAAI202
The Visual Centrifuge: Model-Free Layered Video Representations
True video understanding requires making sense of non-lambertian scenes where
the color of light arriving at the camera sensor encodes information about not
just the last object it collided with, but about multiple mediums -- colored
windows, dirty mirrors, smoke or rain. Layered video representations have the
potential of accurately modelling realistic scenes but have so far required
stringent assumptions on motion, lighting and shape. Here we propose a
learning-based approach for multi-layered video representation: we introduce
novel uncertainty-capturing 3D convolutional architectures and train them to
separate blended videos. We show that these models then generalize to single
videos, where they exhibit interesting abilities: color constancy, factoring
out shadows and separating reflections. We present quantitative and qualitative
results on real world videos.Comment: Appears in: 2019 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2019). This arXiv contains the CVPR Camera Ready version of
the paper (although we have included larger figures) as well as an appendix
detailing the model architectur
De-smokeGCN: Generative Cooperative Networks for Joint Surgical Smoke Detection and Removal
Surgical smoke removal algorithms can improve the quality of intra-operative imaging and reduce hazards in image-guided surgery, a highly desirable post-process for many clinical applications. These algorithms also enable effective computer vision tasks for future robotic surgery. In this paper, we present a new unsupervised learning framework for high-quality pixel-wise smoke detection and removal. One of the well recognized grand challenges in using convolutional neural networks (CNNs) for medical image processing is to obtain intra-operative medical imaging datasets for network training and validation, but availability and quality of these datasets are scarce. Our novel training framework does not require ground-truth image pairs. Instead, it learns purely from computer-generated simulation images. This approach opens up new avenues and bridges a substantial gap between conventional non-learning based methods and which requiring prior knowledge gained from extensive training datasets. Inspired by the Generative Adversarial Network (GAN), we have developed a novel generative-collaborative learning scheme that decomposes the de-smoke process into two separate tasks: smoke detection and smoke removal. The detection network is used as prior knowledge, and also as a loss function to maximize its support for training of the smoke removal network. Quantitative and qualitative studies show that the proposed training framework outperforms the state-of-the-art de-smoking approaches including the latest GAN framework (such as PIX2PIX). Although trained on synthetic images, experimental results on clinical images have proved the effectiveness of the proposed network for detecting and removing surgical smoke on both simulated and real-world laparoscopic images
A Temporal Learning Approach to Inpainting Endoscopic Specularities and Its effect on Image Correspondence
Video streams are utilised to guide minimally-invasive surgery and diagnostic
procedures in a wide range of procedures, and many computer assisted techniques
have been developed to automatically analyse them. These approaches can provide
additional information to the surgeon such as lesion detection, instrument
navigation, or anatomy 3D shape modeling. However, the necessary image features
to recognise these patterns are not always reliably detected due to the
presence of irregular light patterns such as specular highlight reflections. In
this paper, we aim at removing specular highlights from endoscopic videos using
machine learning. We propose using a temporal generative adversarial network
(GAN) to inpaint the hidden anatomy under specularities, inferring its
appearance spatially and from neighbouring frames where they are not present in
the same location. This is achieved using in-vivo data of gastric endoscopy
(Hyper-Kvasir) in a fully unsupervised manner that relies on automatic
detection of specular highlights. System evaluations show significant
improvements to traditional methods through direct comparison as well as other
machine learning techniques through an ablation study that depicts the
importance of the network's temporal and transfer learning components. The
generalizability of our system to different surgical setups and procedures was
also evaluated qualitatively on in-vivo data of gastric endoscopy and ex-vivo
porcine data (SERV-CT, SCARED). We also assess the effect of our method in
computer vision tasks that underpin 3D reconstruction and camera motion
estimation, namely stereo disparity, optical flow, and sparse point feature
matching. These are evaluated quantitatively and qualitatively and results show
a positive effect of specular highlight inpainting on these tasks in a novel
comprehensive analysis
An objective comparison of detection and segmentation algorithms for artefacts in clinical endoscopy
We present a comprehensive analysis of the submissions to the first edition of the Endoscopy Artefact Detection challenge (EAD). Using crowd-sourcing, this initiative is a step towards understanding the limitations of existing state-of-the-art computer vision methods applied to endoscopy and promoting the development of new approaches suitable for clinical translation. Endoscopy is a routine imaging technique for the detection, diagnosis and treatment of diseases in hollow-organs; the esophagus, stomach, colon, uterus and the bladder. However the nature of these organs prevent imaged tissues to be free of imaging artefacts such as bubbles, pixel saturation, organ specularity and debris, all of which pose substantial challenges for any quantitative analysis. Consequently, the potential for improved clinical outcomes through quantitative assessment of abnormal mucosal surface observed in endoscopy videos is presently not realized accurately. The EAD challenge promotes awareness of and addresses this key bottleneck problem by investigating methods that can accurately classify, localize and segment artefacts in endoscopy frames as critical prerequisite tasks. Using a diverse curated multi-institutional, multi-modality, multi-organ dataset of video frames, the accuracy and performance of 23 algorithms were objectively ranked for artefact detection and segmentation. The ability of methods to generalize to unseen datasets was also evaluated. The best performing methods (top 15%) propose deep learning strategies to reconcile variabilities in artefact appearance with respect to size, modality, occurrence and organ type. However, no single method outperformed across all tasks. Detailed analyses reveal the shortcomings of current training strategies and highlight the need for developing new optimal metrics to accurately quantify the clinical applicability of methods
Solving Computer Vision Challenges with Synthetic Data
Computer vision researchers spent a lot of time creating large datasets, yet there is still much information that is difficult to label. Detailed annotations like part segmentation and dense keypoint are expensive to annotate. 3D information requires extra hardware to capture. Besides the labeling cost, an image dataset also lacks the ability to allow an intelligent agent to interact with the world. As a human, we learn through interaction, rather than per-pixel labeled images. To fill in the gap of existing datasets, we propose to build virtual worlds using computer graphics and use generated synthetic data to solve these challenges.
In this dissertation, I demonstrate cases where computer vision challenges can be solved with synthetic data. The first part describes our engineering effort about building a simulation pipeline. The second and third part describes using synthetic data to train better models and diagnose trained models. The major challenge for using synthetic data is the domain gap between real and synthetic. In the model training part, I present two cases, which have different characteristics in terms of domain gap. Two domain adaptation methods are proposed, respectively. Synthetic data saves enormous labeling effort by providing detailed ground truth. In the model diagnosis part, I present how to control nuisance factors to analyze model robustness. Finally, I summarize future research directions that can benefit from synthetic data
- …