30 research outputs found

    Semantic Segmentation Considering Image Degradation, Global Context, and Data Balancing

    Get PDF
    Recently, semantic segmentation – assigning a categorical label to each pixel in an im- age – plays an important role in image understanding applications, e.g., autonomous driving, human-machine interaction and medical imaging. Semantic segmentation has made progress by using the deep convolutional neural networks, which are sur- passing the traditional methods by a large margin. Despite the success of the deep convolutional neural networks (CNNs), there remain three major challenges. The first challenge is how to segment the degraded images semantically, i.e., de- graded image semantic segmentation. In general, image degradations increase the difficulty of semantic segmentation, usually leading to decreased segmentation ac- curacy. While the use of supervised deep learning has substantially improved the state-of-the-art of semantic segmentation, the gap between the feature distribution learned using the clean images and the feature distribution learned using the de- graded images poses a major obstacle to degraded image semantic segmentation. We propose a novel Dense-Gram Network to more effectively reduce the gap than the conventional strategies in segmenting degraded images. Extensive experiments demonstrate that the proposed Dense-Gram Network yields state-of-the-art seman- tic segmentation performance on degraded images synthesized using PASCAL VOC 2012, SUNRGBD, CamVid, and CityScapes datasets. The second challenge is how to embed the global context into the segmentation network. As the existing semantic segmentation networks usually exploit the local context information for inferring the label of a single pixel or patch, without the global context, the CNNs could miss-classify the objects with similar color and shapes. In this thesis, we propose to embed the global context into the segmentation network using object’s spatial relationship. In particular, we introduce a boundary-based metric that measures the level of spatial adjacency between each pair of object classes and find that this metric is robust against object size induced biases. By enforcing this metric into the segmentation loss, we propose a new network, which starts with a segmentation network, followed by a new encoder to compute the proposed boundary- based metric, and then train this network in an end-to-end fashion for semantic image segmentation. We evaluate the proposed method using CamVid and CityScapes datasets and achieve favorable overall performance and a substantial improvement in segmenting small objects. The third challenge of the existing semantic segmentation network is the per- formance decrease induced by data imbalance. At the image level, one semantic class may occur in more images than another. At the pixel level, one semantic class may show larger size than another. Classic strategies such as class re-sampling or cost-sensitive training could not address these data imbalances for multi-label seg- mentation. Here, we propose a selective-weighting strategy to consider the image- and pixel-level data balancing simultaneously when a batch of images are fed into the network. The experimental results on the CityScapes and BRATS2015 benchmark datasets show that the proposed method can effectively improve the performance

    Anatomy-Aware Lymph Node Detection in Chest CT using Implicit Station Stratification

    Full text link
    Finding abnormal lymph nodes in radiological images is highly important for various medical tasks such as cancer metastasis staging and radiotherapy planning. Lymph nodes (LNs) are small glands scattered throughout the body. They are grouped or defined to various LN stations according to their anatomical locations. The CT imaging appearance and context of LNs in different stations vary significantly, posing challenges for automated detection, especially for pathological LNs. Motivated by this observation, we propose a novel end-to-end framework to improve LN detection performance by leveraging their station information. We design a multi-head detector and make each head focus on differentiating the LN and non-LN structures of certain stations. Pseudo station labels are generated by an LN station classifier as a form of multi-task learning during training, so we do not need another explicit LN station prediction model during inference. Our algorithm is evaluated on 82 patients with lung cancer and 91 patients with esophageal cancer. The proposed implicit station stratification method improves the detection sensitivity of thoracic lymph nodes from 65.1% to 71.4% and from 80.3% to 85.5% at 2 false positives per patient on the two datasets, respectively, which significantly outperforms various existing state-of-the-art baseline techniques such as nnUNet, nnDetection and LENS

    LViT: Language meets Vision Transformer in Medical Image Segmentation

    Full text link
    Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.Comment: Accepted by IEEE Transactions on Medical Imaging (TMI

    LViT: Language meets Vision Transformer in Medical Image Segmentation

    Get PDF
    Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT

    Continual Segment: Towards a Single, Unified and Accessible Continual Segmentation Model of 143 Whole-body Organs in CT Scans

    Full text link
    Deep learning empowers the mainstream medical image segmentation methods. Nevertheless current deep segmentation approaches are not capable of efficiently and effectively adapting and updating the trained models when new incremental segmentation classes (along with new training datasets or not) are required to be added. In real clinical environment, it can be preferred that segmentation models could be dynamically extended to segment new organs/tumors without the (re-)access to previous training datasets due to obstacles of patient privacy and data storage. This process can be viewed as a continual semantic segmentation (CSS) problem, being understudied for multi-organ segmentation. In this work, we propose a new architectural CSS learning framework to learn a single deep segmentation model for segmenting a total of 143 whole-body organs. Using the encoder/decoder network structure, we demonstrate that a continually-trained then frozen encoder coupled with incrementally-added decoders can extract and preserve sufficiently representative image features for new classes to be subsequently and validly segmented. To maintain a single network model complexity, we trim each decoder progressively using neural architecture search and teacher-student based knowledge distillation. To incorporate with both healthy and pathological organs appearing in different datasets, a novel anomaly-aware and confidence learning module is proposed to merge the overlapped organ predictions, originated from different decoders. Trained and validated on 3D CT scans of 2500+ patients from four datasets, our single network can segment total 143 whole-body organs with very high accuracy, closely reaching the upper bound performance level by training four separate segmentation models (i.e., one model per dataset/task)

    CT-Based Risk Factors for Mortality of Patients With COVID-19 Pneumonia in Wuhan, China: A Retrospective Study

    Get PDF
    Purpose: Computed tomography (CT) characteristics associated with critical outcomes of patients with coronavirus disease 2019 (COVID-19) have been reported. However, CT risk factors for mortality have not been directly reported. We aim to determine the CT-based quantitative predictors for COVID-19 mortality.Methods: In this retrospective study, laboratory-confirmed COVID-19 patients at Wuhan Central Hospital between December 9, 2019, and March 19, 2020, were included. A novel prognostic biomarker, V-HU score, depicting the volume (V) of total pneumonia infection and the average Hounsfield unit (HU) of consolidation areas was automatically quantified from CT by an artificial intelligence (AI) system. Cox proportional hazards models were used to investigate risk factors for mortality.Results: The study included 238 patients (women 136/238, 57%; median age, 65 years, IQR 51–74 years), 126 of whom were survivors. The V-HU score was an independent predictor (hazard ratio [HR] 2.78, 95% confidence interval [CI] 1.50–5.17; p = 0.001) after adjusting for several COVID-19 prognostic indicators significant in univariable analysis. The prognostic performance of the model containing clinical and outpatient laboratory factors was improved by integrating the V-HU score (c-index: 0.695 vs. 0.728; p < 0.001). Older patients (age ≥ 65 years; HR 3.56, 95% CI 1.64–7.71; p < 0.001) and younger patients (age < 65 years; HR 4.60, 95% CI 1.92–10.99; p < 0.001) could be further risk-stratified by the V-HU score.Conclusions: A combination of an increased volume of total pneumonia infection and high HU value of consolidation areas showed a strong correlation to COVID-19 mortality, as determined by AI quantified CT
    corecore