485 research outputs found
Rethinking foundation models for medical image classification through a benchmark study on MedMNIST
Foundation models are widely employed in medical image analysis, due to their high adaptability and generalizability for downstream tasks. With the increasing number of foundation models being released, model selection has become an important issue. In this work, we study the capabilities of foundation models in medical image classification tasks by conducting a benchmark study on the MedMNIST dataset. Specifically, we adopt various foundation models ranging from convolutional to Transformer-based models and implement both end-to-end training and linear probing for all classification tasks. The results demonstrate the significant potential of these pre-trained models when transferred for medical image classification. We further conduct experiments with different image sizes and various sizes of training data. By analyzing all the results, we provide preliminary, yet useful insights and conclusions on this topic
An Ensemble Method to Automatically Grade Diabetic Retinopathy with Optical Coherence Tomography Angiography Images
Diabetic retinopathy (DR) is a complication of diabetes, and one of the major
causes of vision impairment in the global population. As the early-stage
manifestation of DR is usually very mild and hard to detect, an accurate
diagnosis via eye-screening is clinically important to prevent vision loss at
later stages. In this work, we propose an ensemble method to automatically
grade DR using ultra-wide optical coherence tomography angiography (UW-OCTA)
images available from Diabetic Retinopathy Analysis Challenge (DRAC) 2022.
First, we adopt the state-of-the-art classification networks, i.e., ResNet,
DenseNet, EfficientNet, and VGG, and train them to grade UW-OCTA images with
different splits of the available dataset. Ultimately, we obtain 25 models, of
which, the top 16 models are selected and ensembled to generate the final
predictions. During the training process, we also investigate the multi-task
learning strategy, and add an auxiliary classification task, the Image Quality
Assessment, to improve the model performance. Our final ensemble model achieved
a quadratic weighted kappa (QWK) of 0.9346 and an Area Under Curve (AUC) of
0.9766 on the internal testing dataset, and the QWK of 0.839 and the AUC of
0.8978 on the DRAC challenge testing dataset.Comment: 13 pages, 6 figures, 5 tables. To appear in Diabetic Retinopathy
Analysis Challenge (DRAC), Bin Sheng et al., MICCAI 2022 Challenge, Lecture
Notes in Computer Science, Springe
DiffuSeg:Domain-driven Diffusion for Medical Image Segmentation
In recent years, the deployment of supervised machine learning techniques for segmentation tasks has significantly increased. Nonetheless, the annotation process for extensive datasets remains costly, labor-intensive, and error-prone. While acquiring sufficiently large datasets to train deep learning models is feasible, these datasets often experience a distribution shift relative to the actual test data. This problem is particularly critical in the domain of medical imaging, where it adversely affects the efficacy of automatic segmentation models. In this work, we introduce DiffuSeg, a novel conditional diffusion model developed for medical image data, that exploits any labels to synthesize new images in the target domain. This allows a number of new research directions, including the segmentation task that motivates this work. Our method only requires label maps from any existing datasets and unlabelled images from the target domain for image diffusion. To learn the target domain knowledge, a feature factorization variational autoencoder is proposed to provide conditional information for the diffusion model. Consequently, the segmentation network can be trained with the given labels and the synthetic images, thus avoiding human annotations. Initially, we apply our method to the MNIST dataset and subsequently adapt it for use with medical image segmentation datasets, such as retinal fundus images for vessel segmentation and MRI images for heart segmentation. Our approach exhibits significant improvements over relevant baselines in both image generation and segmentation accuracy, especially in scenarios where annotations for the target dataset are unavailable during training. An open-source implementation of our approach can be released after reviewing.<br/
KneeXNeT: an ensemble-based approach for knee radiographic evaluation
Knee osteoarthritis (OA) is the most common joint disorder and a leading cause of disability. Diagnosing OA severity typically requires expert assessment of X-ray images and is commonly based on the Kellgren-Lawrence grading system, a time-intensive process. This study aimed to develop an automated deep learning model to classify knee OA severity, reducing the need for expert evaluation. First, we evaluated ten state-of-the-art deep learning models, achieving a top accuracy of 0.69 with individual models. To address class imbalance, we employed weighted sampling, improving accuracy to 0.70. We further applied Smooth-GradCAM++ to visualize decision-influencing regions, enhancing the explainability of the best-performing model. Finally, we developed ensemble models using majority voting and a shallow neural network. Our ensemble model, KneeXNet, achieved the highest accuracy of 0.72, demonstrating its potential as an automated tool for knee OA assessment
ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs
In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding
Prompt-and-Refine strategy (Figure 3), two simple but effective
\textbf{training-free} methods to decrease the Token Display Time (TDT) of
streaming ASR models \textbf{without any accuracy loss}. The core idea of
ZeroPrompt is to append zeroed content to each chunk during inference, which
acts like a prompt to encourage the model to predict future tokens even before
they were spoken. We argue that streaming acoustic encoders naturally have the
modeling ability of Masked Language Models and our experiments demonstrate that
ZeroPrompt is engineering cheap and can be applied to streaming acoustic
encoders on any dataset without any accuracy loss. Specifically, compared with
our baseline models, we achieve 350 700ms reduction on First Token
Display Time (TDT-F) and 100 400ms reduction on Last Token Display Time
(TDT-L), with theoretically and experimentally equal WER on both Aishell-1 and
Librispeech datasets.Comment: accepted by interspeech 202
LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech
Recent advances in neural text-to-speech (TTS) models bring thousands of TTS
applications into daily life, where models are deployed in cloud to provide
services for customs. Among these models are diffusion probabilistic models
(DPMs), which can be stably trained and are more parameter-efficient compared
with other generative models. As transmitting data between customs and the
cloud introduces high latency and the risk of exposing private data, deploying
TTS models on edge devices is preferred. When implementing DPMs onto edge
devices, there are two practical problems. First, current DPMs are not
lightweight enough for resource-constrained devices. Second, DPMs require many
denoising steps in inference, which increases latency. In this work, we present
LightGrad, a lightweight DPM for TTS. LightGrad is equipped with a lightweight
U-Net diffusion decoder and a training-free fast sampling technique, reducing
both model parameters and inference latency. Streaming inference is also
implemented in LightGrad to reduce latency further. Compared with Grad-TTS,
LightGrad achieves 62.2% reduction in paramters, 65.7% reduction in latency,
while preserving comparable speech quality on both Chinese Mandarin and English
in 4 denoising steps.Comment: Accepted by ICASSP 202
- …
