8 research outputs found

    Masked Supervised Learning for Semantic Segmentation

    Full text link
    Self-attention is of vital importance in semantic segmentation as it enables modeling of long-range context, which translates into improved performance. We argue that it is equally important to model short-range context, especially to tackle cases where not only the regions of interest are small and ambiguous, but also when there exists an imbalance between the semantic classes. To this end, we propose Masked Supervised Learning (MaskSup), an effective single-stage learning paradigm that models both short- and long-range context, capturing the contextual relationships between pixels via random masking. Experimental results demonstrate the competitive performance of MaskSup against strong baselines in both binary and multi-class segmentation tasks on three standard benchmark datasets, particularly at handling ambiguous regions and retaining better segmentation of minority classes with no added inference cost. In addition to segmenting target regions even when large portions of the input are masked, MaskSup is also generic and can be easily integrated into a variety of semantic segmentation methods. We also show that the proposed method is computationally efficient, yielding an improved performance by 10\% on the mean intersection-over-union (mIoU) while requiring 3×3\times less learnable parameters

    Designing Efficient Deep Learning Models for Computer-Aided Medical Diagnosis

    Get PDF
    Traditional clinician diagnosis, which requires intensive manual effort from experienced medical doctors and radiologists, is notoriously time-consuming, costly and at times error prone. To alleviate these issues, computer-aided diagnosis systems are often used to improve accuracy in early detection, diagnosis, treatment plan and an outcome prediction. While these systems are making strides, significant challenges remain due the scarcity of publicly available data, high annotation cost, and suboptimal performance in detecting rare targets. In this thesis, we design efficient deep learning models for computer-aided medical diagnosis. The contributions are two-fold: First, we introduce an over-sampling method for learning the inter-class mapping between under-represented class samples and over-represented samples in a bid to generate under-represented class samples using unpaired image-to-image translation. These synthetic images are then used as additional training data in the task of detecting abnormalities (i.e. melanoma, COVID-19). Our method achieves improved performance on a standard skin lesion classification benchmark. We show through feature visualization that our approach leads to context based lesion assessment that can reach an expert dermatologist level. Additional experiments also demonstrate the effectiveness of our model in COVID-19 detection from chest radiography images. The synthetic images not only improve performance of various deep learning architectures when used as additional training data under heavy imbalance conditions, but also detect the target class with high confidence. Second, we present a simple, yet effective end-to-end depthwise encoder-decoder fully convolutional network architecture, dubbed Sharp U-Net, for binary and multi-class biomedical image segmentation. Instead of applying a plain skip connection such as U-Net, a depthwise convolution of the encoder feature map with a sharpening kernel filter is employed prior to merging the encoder and decoder features, thereby producing a sharpened intermediate feature map of the same size as the encoder map. Using this sharpening filter layer, we are able to not only fuse semantically less dissimilar features, but also smooth out artifacts throughout the network layers during the early stages of training. Our extensive experiments on six datasets show that the proposed Sharp U-Net model consistently outperforms or matches the recent state-of-the-art baselines in both binary and multi-class segmentation tasks, while adding no extra learnable parameters

    Learning to recognize occluded and small objects with partial inputs

    Full text link
    Recognizing multiple objects in an image is challenging due to occlusions, and becomes even more so when the objects are small. While promising, existing multi-label image recognition models do not explicitly learn context-based representations, and hence struggle to correctly recognize small and occluded objects. Intuitively, recognizing occluded objects requires knowledge of partial input, and hence context. Motivated by this intuition, we propose Masked Supervised Learning (MSL), a single-stage, model-agnostic learning paradigm for multi-label image recognition. The key idea is to learn context-based representations using a masked branch and to model label co-occurrence using label consistency. Experimental results demonstrate the simplicity, applicability and more importantly the competitive performance of MSL against previous state-of-the-art methods on standard multi-label image recognition benchmarks. In addition, we show that MSL is robust to random masking and demonstrate its effectiveness in recognizing non-masked objects. Code and pretrained models are available on GitHub

    CosSIF: Cosine similarity-based image filtering to overcome low inter-class variation in synthetic medical image datasets

    Full text link
    Crafting effective deep learning models for medical image analysis is a complex task, particularly in cases where the medical image dataset lacks significant inter-class variation. This challenge is further aggravated when employing such datasets to generate synthetic images using generative adversarial networks (GANs), as the output of GANs heavily relies on the input data. In this research, we propose a novel filtering algorithm called Cosine Similarity-based Image Filtering (CosSIF). We leverage CosSIF to develop two distinct filtering methods: Filtering Before GAN Training (FBGT) and Filtering After GAN Training (FAGT). FBGT involves the removal of real images that exhibit similarities to images of other classes before utilizing them as the training dataset for a GAN. On the other hand, FAGT focuses on eliminating synthetic images with less discriminative features compared to real images used for training the GAN. Experimental results reveal that employing either the FAGT or FBGT method with modern transformer and convolutional-based networks leads to substantial performance gains in various evaluation metrics. FAGT implementation on the ISIC-2016 dataset surpasses the baseline method in terms of sensitivity by 1.59% and AUC by 1.88%. Furthermore, for the HAM10000 dataset, applying FABT outperforms the baseline approach in terms of recall by 13.75%, and with the sole implementation of FAGT, achieves a maximum accuracy of 94.44%.Comment: 18 pages, 20 figure

    MoNuSAC2020:A Multi-Organ Nuclei Segmentation and Classification Challenge

    Get PDF
    Detecting various types of cells in and around the tumor matrix holds a special significance in characterizing the tumor micro-environment for cancer prognostication and research. Automating the tasks of detecting, segmenting, and classifying nuclei can free up the pathologists' time for higher value tasks and reduce errors due to fatigue and subjectivity. To encourage the computer vision research community to develop and test algorithms for these tasks, we prepared a large and diverse dataset of nucleus boundary annotations and class labels. The dataset has over 46,000 nuclei from 37 hospitals, 71 patients, four organs, and four nucleus types. We also organized a challenge around this dataset as a satellite event at the International Symposium on Biomedical Imaging (ISBI) in April 2020. The challenge saw a wide participation from across the world, and the top methods were able to match inter-human concordance for the challenge metric. In this paper, we summarize the dataset and the key findings of the challenge, including the commonalities and differences between the methods developed by various participants. We have released the MoNuSAC2020 dataset to the public
    corecore