41 research outputs found
LViT: Language meets Vision Transformer in Medical Image Segmentation
Deep learning has been widely used in medical image segmentation and other
aspects. However, the performance of existing medical image segmentation models
has been limited by the challenge of obtaining sufficient high-quality labeled
data due to the prohibitive data annotation cost. To alleviate this limitation,
we propose a new text-augmented medical image segmentation model LViT (Language
meets Vision Transformer). In our LViT model, medical text annotation is
incorporated to compensate for the quality deficiency in image data. In
addition, the text information can guide to generate pseudo labels of improved
quality in the semi-supervised learning. We also propose an Exponential Pseudo
label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM)
preserve local image features in semi-supervised LViT setting. In our model, LV
(Language-Vision) loss is designed to supervise the training of unlabeled
images using text information directly. For evaluation, we construct three
multimodal medical segmentation datasets (image + text) containing X-rays and
CT images. Experimental results show that our proposed LViT has superior
segmentation performance in both fully-supervised and semi-supervised setting.
The code and datasets are available at https://github.com/HUANGLIZI/LViT.Comment: Accepted by IEEE Transactions on Medical Imaging (TMI
LViT: Language meets Vision Transformer in Medical Image Segmentation
Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT
Weakly Supervised Universal Fracture Detection in Pelvic X-rays
Hip and pelvic fractures are serious injuries with life-threatening
complications. However, diagnostic errors of fractures in pelvic X-rays (PXRs)
are very common, driving the demand for computer-aided diagnosis (CAD)
solutions. A major challenge lies in the fact that fractures are localized
patterns that require localized analyses. Unfortunately, the PXRs residing in
hospital picture archiving and communication system do not typically specify
region of interests. In this paper, we propose a two-stage hip and pelvic
fracture detection method that executes localized fracture classification using
weakly supervised ROI mining. The first stage uses a large capacity
fully-convolutional network, i.e., deep with high levels of abstraction, in a
multiple instance learning setting to automatically mine probable true positive
and definite hard negative ROIs from the whole PXR in the training data. The
second stage trains a smaller capacity model, i.e., shallower and more
generalizable, with the mined ROIs to perform localized analyses to classify
fractures. During inference, our method detects hip and pelvic fractures in one
pass by chaining the probability outputs of the two stages together. We
evaluate our method on 4 410 PXRs, reporting an area under the ROC curve value
of 0.975, the highest among state-of-the-art fracture detection methods.
Moreover, we show that our two-stage approach can perform comparably to human
physicians (even outperforming emergency physicians and surgeons), in a
preliminary reader study of 23 readers.Comment: MICCAI 2019 (early accept
Anatomy-Aware Lymph Node Detection in Chest CT using Implicit Station Stratification
Finding abnormal lymph nodes in radiological images is highly important for
various medical tasks such as cancer metastasis staging and radiotherapy
planning. Lymph nodes (LNs) are small glands scattered throughout the body.
They are grouped or defined to various LN stations according to their
anatomical locations. The CT imaging appearance and context of LNs in different
stations vary significantly, posing challenges for automated detection,
especially for pathological LNs. Motivated by this observation, we propose a
novel end-to-end framework to improve LN detection performance by leveraging
their station information. We design a multi-head detector and make each head
focus on differentiating the LN and non-LN structures of certain stations.
Pseudo station labels are generated by an LN station classifier as a form of
multi-task learning during training, so we do not need another explicit LN
station prediction model during inference. Our algorithm is evaluated on 82
patients with lung cancer and 91 patients with esophageal cancer. The proposed
implicit station stratification method improves the detection sensitivity of
thoracic lymph nodes from 65.1% to 71.4% and from 80.3% to 85.5% at 2 false
positives per patient on the two datasets, respectively, which significantly
outperforms various existing state-of-the-art baseline techniques such as
nnUNet, nnDetection and LENS