100 research outputs found
Contrastive Registration for Unsupervised Medical Image Segmentation
Medical image segmentation is a relevant task as it serves as the first step
for several diagnosis processes, thus it is indispensable in clinical usage.
Whilst major success has been reported using supervised techniques, they assume
a large and well-representative labelled set. This is a strong assumption in
the medical domain where annotations are expensive, time-consuming, and
inherent to human bias. To address this problem, unsupervised techniques have
been proposed in the literature yet it is still an open problem due to the
difficulty of learning any transformation pattern. In this work, we present a
novel optimisation model framed into a new CNN-based contrastive registration
architecture for unsupervised medical image segmentation. The core of our
approach is to exploit image-level registration and feature-level from a
contrastive learning mechanism, to perform registration-based segmentation.
Firstly, we propose an architecture to capture the image-to-image
transformation pattern via registration for unsupervised medical image
segmentation. Secondly, we embed a contrastive learning mechanism into the
registration architecture to enhance the discriminating capacity of the network
in the feature-level. We show that our proposed technique mitigates the major
drawbacks of existing unsupervised techniques. We demonstrate, through
numerical and visual experiments, that our technique substantially outperforms
the current state-of-the-art unsupervised segmentation methods on two major
medical image datasets.Comment: 11 pages, 3 figure
Parsing is All You Need for Accurate Gait Recognition in the Wild
Binary silhouettes and keypoint-based skeletons have dominated human gait
recognition studies for decades since they are easy to extract from video
frames. Despite their success in gait recognition for in-the-lab environments,
they usually fail in real-world scenarios due to their low information entropy
for gait representations. To achieve accurate gait recognition in the wild,
this paper presents a novel gait representation, named Gait Parsing Sequence
(GPS). GPSs are sequences of fine-grained human segmentation, i.e., human
parsing, extracted from video frames, so they have much higher information
entropy to encode the shapes and dynamics of fine-grained human parts during
walking. Moreover, to effectively explore the capability of the GPS
representation, we propose a novel human parsing-based gait recognition
framework, named ParsingGait. ParsingGait contains a Convolutional Neural
Network (CNN)-based backbone and two light-weighted heads. The first head
extracts global semantic features from GPSs, while the other one learns mutual
information of part-level features through Graph Convolutional Networks to
model the detailed dynamics of human walking. Furthermore, due to the lack of
suitable datasets, we build the first parsing-based dataset for gait
recognition in the wild, named Gait3D-Parsing, by extending the large-scale and
challenging Gait3D dataset. Based on Gait3D-Parsing, we comprehensively
evaluate our method and existing gait recognition methods. The experimental
results show a significant improvement in accuracy brought by the GPS
representation and the superiority of ParsingGait. The code and dataset are
available at https://gait3d.github.io/gait3d-parsing-hp .Comment: 16 pages, 14 figures, ACM MM 2023 accepted, project page:
https://gait3d.github.io/gait3d-parsing-h
H-DenseUNet for Kidney and Tumor Segmentation from CT Scans
Automatic kidney tumor segmentation from CT scans is an essential step for computer-aided diagnosis of cancer. In this paper, we present an improved H-DenseUNet for kidney and tumor segmentation. Specifically, we first train the DenseUNet and then fine tune the network with the 3D counterpart. To further increase the performance, we employ both cross-entropy and dice loss. We evaluate our method on the 2019 MICCAI kidney and tumor segmentation challenge. We split the training dataset of the challenge to 200 training set and 10 validation set. On the validation set, our method achieves 97.0% (Dice) for kidney segmentation and 67.2% (Dice) for tumor segmentation. This model is submitted to the challenge for final performance evaluation on the test dataset
Multimodal critical-scenarios search method for test of autonomous vehicles
Purpose – The purpose of this paper is to search for the critical-scenarios of autonomous vehicles (AVs) quickly and comprehensively, which is essential for verification and validation (V&V). Design/methodology/approach – The author adopted the index F1 to quantitative critical-scenarios' coverage of the search space and proposed the improved particle swarm optimization (IPSO) to enhance exploration ability for higher coverage. Compared with the particle swarm optimization (PSO), there were three improvements. In the initial phase, the Latin hypercube sampling method was introduced for a uniform distribution of particles. In the iteration phase, the neighborhood operator was adapted to explore more modals with the particles divided into groups. In the convergence phase, the convergence judgment and restart strategy were used to explore the search space by avoiding local convergence. Compared with the Monte Carlo method (MC) and PSO, experiments on the artificial function and critical-scenarios search were carried out to verify the efficiency and the application effect of the method. Findings – Results show that IPSO can search for multimodal critical-scenarios comprehensively, with a stricter threshold and fewer samples in the experiment on critical-scenario search, the coverage of IPSO is 14% higher than PSO and 40% higher than MC. Originality/value – The critical-scenarios' coverage of the search space is firstly quantified by the index F1, and the proposed method has higher search efficiency and coverage for the critical-scenarios search of AVs, which shows application potential for V&V
Why Deep Surgical Models Fail?: Revisiting Surgical Action Triplet Recognition through the Lens of Robustness
Surgical action triplet recognition provides a better understanding of the
surgical scene. This task is of high relevance as it provides to the surgeon
with context-aware support and safety. The current go-to strategy for improving
performance is the development of new network mechanisms. However, the
performance of current state-of-the-art techniques is substantially lower than
other surgical tasks. Why is this happening? This is the question that we
address in this work. We present the first study to understand the failure of
existing deep learning models through the lens of robustness and explainabilty.
Firstly, we study current existing models under weak and strong
perturbations via adversarial optimisation scheme. We then provide the
failure modes via feature based explanations. Our study revels that the key for
improving performance and increasing reliability is in the core and spurious
attributes. Our work opens the door to more trustworthiness and reliability
deep learning models in surgical science
Homeomorphic Image Registration via Conformal-Invariant Hyperelastic Regularisation
Deformable image registration is a fundamental task in medical image analysis
and plays a crucial role in a wide range of clinical applications. Recently,
deep learning-based approaches have been widely studied for deformable medical
image registration and achieved promising results. However, existing deep
learning image registration techniques do not theoretically guarantee
topology-preserving transformations. This is a key property to preserve
anatomical structures and achieve plausible transformations that can be used in
real clinical settings. We propose a novel framework for deformable image
registration. Firstly, we introduce a novel regulariser based on
conformal-invariant properties in a nonlinear elasticity setting. Our
regulariser enforces the deformation field to be smooth, invertible and
orientation-preserving. More importantly, we strictly guarantee topology
preservation yielding to a clinical meaningful registration. Secondly, we boost
the performance of our regulariser through coordinate MLPs, where one can view
the to-be-registered images as continuously differentiable entities. We
demonstrate, through numerical and visual experiments, that our framework is
able to outperform current techniques for image registration.Comment: 11 pages, 3 figure
SCOTCH and SODA: A Transformer Video Shadow Detection Framework
Shadows in videos are difficult to detect because of the large shadow
deformation between frames. In this work, we argue that accounting for shadow
deformation is essential when designing a video shadow detection method. To
this end, we introduce the shadow deformation attention trajectory (SODA), a
new type of video self-attention module, specially designed to handle the large
shadow deformations in videos. Moreover, we present a new shadow contrastive
learning mechanism (SCOTCH) which aims at guiding the network to learn a
unified shadow representation from massive positive shadow pairs across
different videos. We demonstrate empirically the effectiveness of our two
contributions in an ablation study. Furthermore, we show that SCOTCH and SODA
significantly outperforms existing techniques for video shadow detection. Code
is available at the project page:
https://lihaoliu-cambridge.github.io/scotch_and_soda/Comment: Accepted to CVPR 202
MammoDG: Generalisable Deep Learning Breaks the Limits of Cross-Domain Multi-Center Breast Cancer Screening
Breast cancer is a major cause of cancer death among women, emphasising the
importance of early detection for improved treatment outcomes and quality of
life. Mammography, the primary diagnostic imaging test, poses challenges due to
the high variability and patterns in mammograms. Double reading of mammograms
is recommended in many screening programs to improve diagnostic accuracy but
increases radiologists' workload. Researchers explore Machine Learning models
to support expert decision-making. Stand-alone models have shown comparable or
superior performance to radiologists, but some studies note decreased
sensitivity with multiple datasets, indicating the need for high generalisation
and robustness models. This work devises MammoDG, a novel deep-learning
framework for generalisable and reliable analysis of cross-domain multi-center
mammography data. MammoDG leverages multi-view mammograms and a novel
contrastive mechanism to enhance generalisation capabilities. Extensive
validation demonstrates MammoDG's superiority, highlighting the critical
importance of domain generalisation for trustworthy mammography analysis in
imaging protocol variations
- …