66 research outputs found
Efficient video indexing for monitoring disease activity and progression in the upper gastrointestinal tract
Endoscopy is a routine imaging technique used for both diagnosis and
minimally invasive surgical treatment. While the endoscopy video contains a
wealth of information, tools to capture this information for the purpose of
clinical reporting are rather poor. In date, endoscopists do not have any
access to tools that enable them to browse the video data in an efficient and
user friendly manner. Fast and reliable video retrieval methods could for
example, allow them to review data from previous exams and therefore improve
their ability to monitor disease progression. Deep learning provides new
avenues of compressing and indexing video in an extremely efficient manner. In
this study, we propose to use an autoencoder for efficient video compression
and fast retrieval of video images. To boost the accuracy of video image
retrieval and to address data variability like multi-modality and view-point
changes, we propose the integration of a Siamese network. We demonstrate that
our approach is competitive in retrieving images from 3 large scale videos of 3
different patients obtained against the query samples of their previous
diagnosis. Quantitative validation shows that the combined approach yield an
overall improvement of 5% and 8% over classical and variational autoencoders,
respectively.Comment: Accepted at IEEE International Symposium on Biomedical Imaging
(ISBI), 201
Real-time polyp segmentation using U-net with IoU loss
Colonoscopy is the third leading cause of cancer deaths worldwide. While automated segmentation methods can help detect polyps and consequently improve their surgical removal, the clinical usability of these methods requires a trade-off between accuracy and speed. In this work, we exploit the traditional U-Net methods and compare different segmentation-loss functions. Our results demonstrate that IoU loss results in an improved segmentation performance (nearly 3% improvement on Dice) for real-time polyp segmentation
SSL-CPCD: Self-supervised learning with composite pretext-class discrimination for improved generalisability in endoscopic image analysis
Data-driven methods have shown tremendous progress in medical image analysis.
In this context, deep learning-based supervised methods are widely popular.
However, they require a large amount of training data and face issues in
generalisability to unseen datasets that hinder clinical translation.
Endoscopic imaging data incorporates large inter- and intra-patient variability
that makes these models more challenging to learn representative features for
downstream tasks. Thus, despite the publicly available datasets and datasets
that can be generated within hospitals, most supervised models still
underperform. While self-supervised learning has addressed this problem to some
extent in natural scene data, there is a considerable performance gap in the
medical image domain. In this paper, we propose to explore patch-level
instance-group discrimination and penalisation of inter-class variation using
additive angular margin within the cosine similarity metrics. Our novel
approach enables models to learn to cluster similar representative patches,
thereby improving their ability to provide better separation between different
classes. Our results demonstrate significant improvement on all metrics over
the state-of-the-art (SOTA) methods on the test set from the same and diverse
datasets. We evaluated our approach for classification, detection, and
segmentation. SSL-CPCD achieves 79.77% on Top 1 accuracy for ulcerative colitis
classification, 88.62% on mAP for polyp detection, and 82.32% on dice
similarity coefficient for segmentation tasks are nearly over 4%, 2%, and 3%,
respectively, compared to the baseline architectures. We also demonstrate that
our method generalises better than all SOTA methods to unseen datasets,
reporting nearly 7% improvement in our generalisability assessment.Comment: 1
A deep learning framework for quality assessment and restoration in video endoscopy
Endoscopy is a routine imaging technique used for both diagnosis and
minimally invasive surgical treatment. Artifacts such as motion blur, bubbles,
specular reflections, floating objects and pixel saturation impede the visual
interpretation and the automated analysis of endoscopy videos. Given the
widespread use of endoscopy in different clinical applications, we contend that
the robust and reliable identification of such artifacts and the automated
restoration of corrupted video frames is a fundamental medical imaging problem.
Existing state-of-the-art methods only deal with the detection and restoration
of selected artifacts. However, typically endoscopy videos contain numerous
artifacts which motivates to establish a comprehensive solution.
We propose a fully automatic framework that can: 1) detect and classify six
different primary artifacts, 2) provide a quality score for each frame and 3)
restore mildly corrupted frames. To detect different artifacts our framework
exploits fast multi-scale, single stage convolutional neural network detector.
We introduce a quality metric to assess frame quality and predict image
restoration success. Generative adversarial networks with carefully chosen
regularization are finally used to restore corrupted frames.
Our detector yields the highest mean average precision (mAP at 5% threshold)
of 49.0 and the lowest computational time of 88 ms allowing for accurate
real-time processing. Our restoration models for blind deblurring, saturation
correction and inpainting demonstrate significant improvements over previous
methods. On a set of 10 test videos we show that our approach preserves an
average of 68.7% which is 25% more frames than that retained from the raw
videos.Comment: 14 page
A comprehensive survey on recent deep learning-based methods applied to surgical data
Minimally invasive surgery is highly operator dependant with a lengthy
procedural time causing fatigue to surgeon and risks to patients such as injury
to organs, infection, bleeding, and complications of anesthesia. To mitigate
such risks, real-time systems are desired to be developed that can provide
intra-operative guidance to surgeons. For example, an automated system for tool
localization, tool (or tissue) tracking, and depth estimation can enable a
clear understanding of surgical scenes preventing miscalculations during
surgical procedures. In this work, we present a systematic review of recent
machine learning-based approaches including surgical tool localization,
segmentation, tracking, and 3D scene perception. Furthermore, we provide a
detailed overview of publicly available benchmark datasets widely used for
surgical navigation tasks. While recent deep learning architectures have shown
promising results, there are still several open research problems such as a
lack of annotated datasets, the presence of artifacts in surgical scenes, and
non-textured surfaces that hinder 3D reconstruction of the anatomical
structures. Based on our comprehensive review, we present a discussion on
current gaps and needed steps to improve the adaptation of technology in
surgery.Comment: This paper is to be submitted to International journal of computer
visio
SUPRA: Superpixel Guided Loss for Improved Multi-modal Segmentation in Endoscopy
Domain shift is a well-known problem in the medical imaging community. In
particular, for endoscopic image analysis where the data can have different
modalities the performance of deep learning (DL) methods gets adversely
affected. In other words, methods developed on one modality cannot be used for
a different modality. However, in real clinical settings, endoscopists switch
between modalities for better mucosal visualisation. In this paper, we explore
the domain generalisation technique to enable DL methods to be used in such
scenarios. To this extend, we propose to use super pixels generated with Simple
Linear Iterative Clustering (SLIC) which we refer to as "SUPRA" for SUPeRpixel
Augmented method. SUPRA first generates a preliminary segmentation mask making
use of our new loss "SLICLoss" that encourages both an accurate and
color-consistent segmentation. We demonstrate that SLICLoss when combined with
Binary Cross Entropy loss (BCE) can improve the model's generalisability with
data that presents significant domain shift. We validate this novel compound
loss on a vanilla U-Net using the EndoUDA dataset, which contains images for
Barret's Esophagus and polyps from two modalities. We show that our method
yields an improvement of nearly 20% in the target domain set compared to the
baseline.Comment: This work has been accepted at the LatinX in Computer Vision Research
Workshop at CVPR 202
Multi-task learning with cross-task consistency for improved depth estimation in colonoscopy
Colonoscopy screening is the gold standard procedure for assessing
abnormalities in the colon and rectum, such as ulcers and cancerous polyps.
Measuring the abnormal mucosal area and its 3D reconstruction can help quantify
the surveyed area and objectively evaluate disease burden. However, due to the
complex topology of these organs and variable physical conditions, for example,
lighting, large homogeneous texture, and image modality estimating distance
from the camera aka depth) is highly challenging. Moreover, most colonoscopic
video acquisition is monocular, making the depth estimation a non-trivial
problem. While methods in computer vision for depth estimation have been
proposed and advanced on natural scene datasets, the efficacy of these
techniques has not been widely quantified on colonoscopy datasets. As the
colonic mucosa has several low-texture regions that are not well pronounced,
learning representations from an auxiliary task can improve salient feature
extraction, allowing estimation of accurate camera depths. In this work, we
propose to develop a novel multi-task learning (MTL) approach with a shared
encoder and two decoders, namely a surface normal decoder and a depth estimator
decoder. Our depth estimator incorporates attention mechanisms to enhance
global context awareness. We leverage the surface normal prediction to improve
geometric feature extraction. Also, we apply a cross-task consistency loss
among the two geometrically related tasks, surface normal and camera depth. We
demonstrate an improvement of 14.17% on relative error and 10.4% improvement on
accuracy over the most accurate baseline state-of-the-art BTS
approach. All experiments are conducted on a recently released C3VD dataset;
thus, we provide a first benchmark of state-of-the-art methods.Comment: 19 page
Patch-level instance-group discrimination with pretext-invariant learning for colitis scoring
Inflammatory bowel disease (IBD), in particular ulcerative colitis (UC), is
graded by endoscopists and this assessment is the basis for risk stratification
and therapy monitoring. Presently, endoscopic characterisation is largely
operator dependant leading to sometimes undesirable clinical outcomes for
patients with IBD. We focus on the Mayo Endoscopic Scoring (MES) system which
is widely used but requires the reliable identification of subtle changes in
mucosal inflammation. Most existing deep learning classification methods cannot
detect these fine-grained changes which make UC grading such a challenging
task. In this work, we introduce a novel patch-level instance-group
discrimination with pretext-invariant representation learning (PLD-PIRL) for
self-supervised learning (SSL). Our experiments demonstrate both improved
accuracy and robustness compared to the baseline supervised network and several
state-of-the-art SSL methods. Compared to the baseline (ResNet50) supervised
classification our proposed PLD-PIRL obtained an improvement of 4.75% on
hold-out test data and 6.64% on unseen center test data for top-1 accuracy.Comment: 1
- …