8 research outputs found
Masked Supervised Learning for Semantic Segmentation
Self-attention is of vital importance in semantic segmentation as it enables
modeling of long-range context, which translates into improved performance. We
argue that it is equally important to model short-range context, especially to
tackle cases where not only the regions of interest are small and ambiguous,
but also when there exists an imbalance between the semantic classes. To this
end, we propose Masked Supervised Learning (MaskSup), an effective single-stage
learning paradigm that models both short- and long-range context, capturing the
contextual relationships between pixels via random masking. Experimental
results demonstrate the competitive performance of MaskSup against strong
baselines in both binary and multi-class segmentation tasks on three standard
benchmark datasets, particularly at handling ambiguous regions and retaining
better segmentation of minority classes with no added inference cost. In
addition to segmenting target regions even when large portions of the input are
masked, MaskSup is also generic and can be easily integrated into a variety of
semantic segmentation methods. We also show that the proposed method is
computationally efficient, yielding an improved performance by 10\% on the mean
intersection-over-union (mIoU) while requiring less learnable
parameters
Designing Efficient Deep Learning Models for Computer-Aided Medical Diagnosis
Traditional clinician diagnosis, which requires intensive manual effort from experienced medical doctors and radiologists, is notoriously time-consuming, costly and at times error prone. To alleviate these issues, computer-aided diagnosis systems are often used to improve accuracy in early detection, diagnosis, treatment plan and an outcome prediction. While these systems are making strides, significant challenges remain due the scarcity of publicly available data, high annotation cost, and suboptimal performance in detecting rare targets. In this thesis, we design efficient deep learning models for computer-aided medical diagnosis. The contributions are two-fold: First, we introduce an over-sampling method for learning the inter-class mapping between under-represented class samples and over-represented samples in a bid to generate under-represented class samples using unpaired image-to-image translation. These synthetic images are then used as additional training data in the task of detecting abnormalities (i.e. melanoma, COVID-19). Our method achieves improved performance on a standard skin lesion classification benchmark. We show through feature visualization that our approach leads to context based lesion assessment that can reach an expert dermatologist level. Additional experiments also demonstrate the effectiveness of our model in COVID-19 detection from chest radiography images. The synthetic images not only improve performance of various deep learning architectures when used as additional training data under heavy imbalance conditions, but also detect the target class with high confidence.
Second, we present a simple, yet effective end-to-end depthwise encoder-decoder fully convolutional network architecture, dubbed Sharp U-Net, for binary and multi-class biomedical image segmentation. Instead of applying a plain skip connection such as U-Net, a depthwise convolution of the encoder feature map with a sharpening kernel filter is employed prior to merging the encoder and decoder features, thereby producing a sharpened intermediate feature map of the same size as the encoder map. Using this sharpening filter layer, we are able to not only fuse semantically less dissimilar features, but also smooth out artifacts throughout the network layers during the early stages of training. Our extensive experiments on six datasets show that the proposed Sharp U-Net model consistently outperforms or matches the recent state-of-the-art baselines in both binary and multi-class segmentation tasks, while adding no extra learnable parameters
Learning to recognize occluded and small objects with partial inputs
Recognizing multiple objects in an image is challenging due to occlusions,
and becomes even more so when the objects are small. While promising, existing
multi-label image recognition models do not explicitly learn context-based
representations, and hence struggle to correctly recognize small and occluded
objects. Intuitively, recognizing occluded objects requires knowledge of
partial input, and hence context. Motivated by this intuition, we propose
Masked Supervised Learning (MSL), a single-stage, model-agnostic learning
paradigm for multi-label image recognition. The key idea is to learn
context-based representations using a masked branch and to model label
co-occurrence using label consistency. Experimental results demonstrate the
simplicity, applicability and more importantly the competitive performance of
MSL against previous state-of-the-art methods on standard multi-label image
recognition benchmarks. In addition, we show that MSL is robust to random
masking and demonstrate its effectiveness in recognizing non-masked objects.
Code and pretrained models are available on GitHub
CosSIF: Cosine similarity-based image filtering to overcome low inter-class variation in synthetic medical image datasets
Crafting effective deep learning models for medical image analysis is a
complex task, particularly in cases where the medical image dataset lacks
significant inter-class variation. This challenge is further aggravated when
employing such datasets to generate synthetic images using generative
adversarial networks (GANs), as the output of GANs heavily relies on the input
data. In this research, we propose a novel filtering algorithm called Cosine
Similarity-based Image Filtering (CosSIF). We leverage CosSIF to develop two
distinct filtering methods: Filtering Before GAN Training (FBGT) and Filtering
After GAN Training (FAGT). FBGT involves the removal of real images that
exhibit similarities to images of other classes before utilizing them as the
training dataset for a GAN. On the other hand, FAGT focuses on eliminating
synthetic images with less discriminative features compared to real images used
for training the GAN. Experimental results reveal that employing either the
FAGT or FBGT method with modern transformer and convolutional-based networks
leads to substantial performance gains in various evaluation metrics. FAGT
implementation on the ISIC-2016 dataset surpasses the baseline method in terms
of sensitivity by 1.59% and AUC by 1.88%. Furthermore, for the HAM10000
dataset, applying FABT outperforms the baseline approach in terms of recall by
13.75%, and with the sole implementation of FAGT, achieves a maximum accuracy
of 94.44%.Comment: 18 pages, 20 figure
MoNuSAC2020:A Multi-Organ Nuclei Segmentation and Classification Challenge
Detecting various types of cells in and around the tumor matrix holds a special significance in characterizing the tumor micro-environment for cancer prognostication and research. Automating the tasks of detecting, segmenting, and classifying nuclei can free up the pathologists' time for higher value tasks and reduce errors due to fatigue and subjectivity. To encourage the computer vision research community to develop and test algorithms for these tasks, we prepared a large and diverse dataset of nucleus boundary annotations and class labels. The dataset has over 46,000 nuclei from 37 hospitals, 71 patients, four organs, and four nucleus types. We also organized a challenge around this dataset as a satellite event at the International Symposium on Biomedical Imaging (ISBI) in April 2020. The challenge saw a wide participation from across the world, and the top methods were able to match inter-human concordance for the challenge metric. In this paper, we summarize the dataset and the key findings of the challenge, including the commonalities and differences between the methods developed by various participants. We have released the MoNuSAC2020 dataset to the public