Search CORE

485 research outputs found

Rethinking foundation models for medical image classification through a benchmark study on MedMNIST

Author: Papiez Bartlomiej W
Wu Fuping
Publication venue: arXiv
Publication date: 24/01/2025
Field of study

Foundation models are widely employed in medical image analysis, due to their high adaptability and generalizability for downstream tasks. With the increasing number of foundation models being released, model selection has become an important issue. In this work, we study the capabilities of foundation models in medical image classification tasks by conducting a benchmark study on the MedMNIST dataset. Specifically, we adopt various foundation models ranging from convolutional to Transformer-based models and implement both end-to-end training and linear probing for all classification tasks. The results demonstrate the significant potential of these pre-trained models when transferred for medical image classification. We further conduct experiments with different image sizes and various sizes of training data. By analyzing all the results, we provide preliminary, yet useful insights and conclusions on this topic

Oxford University Research Archive

An Ensemble Method to Automatically Grade Diabetic Retinopathy with Optical Coherence Tomography Angiography Images

Author: Papież Bartłomiej W.
Wu Fuping
Zheng Yuhan
Publication venue
Publication date: 12/12/2022
Field of study

Diabetic retinopathy (DR) is a complication of diabetes, and one of the major causes of vision impairment in the global population. As the early-stage manifestation of DR is usually very mild and hard to detect, an accurate diagnosis via eye-screening is clinically important to prevent vision loss at later stages. In this work, we propose an ensemble method to automatically grade DR using ultra-wide optical coherence tomography angiography (UW-OCTA) images available from Diabetic Retinopathy Analysis Challenge (DRAC) 2022. First, we adopt the state-of-the-art classification networks, i.e., ResNet, DenseNet, EfficientNet, and VGG, and train them to grade UW-OCTA images with different splits of the available dataset. Ultimately, we obtain 25 models, of which, the top 16 models are selected and ensembled to generate the final predictions. During the training process, we also investigate the multi-task learning strategy, and add an auxiliary classification task, the Image Quality Assessment, to improve the model performance. Our final ensemble model achieved a quadratic weighted kappa (QWK) of 0.9346 and an Area Under Curve (AUC) of 0.9766 on the internal testing dataset, and the QWK of 0.839 and the AUC of 0.8978 on the DRAC challenge testing dataset.Comment: 13 pages, 6 figures, 5 tables. To appear in Diabetic Retinopathy Analysis Challenge (DRAC), Bin Sheng et al., MICCAI 2022 Challenge, Lecture Notes in Computer Science, Springe

arXiv.org e-Print Archive

DiffuSeg:Domain-driven Diffusion for Medical Image Segmentation

Author: Bronik Kevin
Papież Bartłomiej W.
Wu Fuping
Zhang Le
Publication venue
Publication date: 07/01/2025
Field of study

In recent years, the deployment of supervised machine learning techniques for segmentation tasks has significantly increased. Nonetheless, the annotation process for extensive datasets remains costly, labor-intensive, and error-prone. While acquiring sufficiently large datasets to train deep learning models is feasible, these datasets often experience a distribution shift relative to the actual test data. This problem is particularly critical in the domain of medical imaging, where it adversely affects the efficacy of automatic segmentation models. In this work, we introduce DiffuSeg, a novel conditional diffusion model developed for medical image data, that exploits any labels to synthesize new images in the target domain. This allows a number of new research directions, including the segmentation task that motivates this work. Our method only requires label maps from any existing datasets and unlabelled images from the target domain for image diffusion. To learn the target domain knowledge, a feature factorization variational autoencoder is proposed to provide conditional information for the diffusion model. Consequently, the segmentation network can be trained with the given labels and the synthetic images, thus avoiding human annotations. Initially, we apply our method to the MNIST dataset and subsequently adapt it for use with medical image segmentation datasets, such as retinal fundus images for vessel segmentation and MRI images for heart segmentation. Our approach exhibits significant improvements over relevant baselines in both image generation and segmentation accuracy, especially in scenarios where annotations for the target dataset are unavailable during training. An open-source implementation of our approach can be released after reviewing.<br/

University of Birmingham Research Portal

Oxford University Research Archive

KneeXNeT: an ensemble-based approach for knee radiographic evaluation

Author: Kundu Soumya Snigdha
Papiez Bartlomiej
Srikijkasemwat Nicharee
Wu Fuping
Publication venue: Springer
Publication date: 04/04/2025
Field of study

Knee osteoarthritis (OA) is the most common joint disorder and a leading cause of disability. Diagnosing OA severity typically requires expert assessment of X-ray images and is commonly based on the Kellgren-Lawrence grading system, a time-intensive process. This study aimed to develop an automated deep learning model to classify knee OA severity, reducing the need for expert evaluation. First, we evaluated ten state-of-the-art deep learning models, achieving a top accuracy of 0.69 with individual models. To address class imbalance, we employed weighted sampling, improving accuracy to 0.70. We further applied Smooth-GradCAM++ to visualize decision-influencing regions, enhancing the explainability of the best-performing model. Finally, we developed ensemble models using majority voting and a shallow neural network. Our ensemble model, KneeXNet, achieved the highest accuracy of 0.72, demonstrating its potential as an automated tool for knee OA assessment

Oxford University Research Archive

ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs

Author: Dang Bo
Pan Fuping
Peng Zhendong
Song Xingchen
Wu Di
Wu Zhiyong
Zhang Binbin
Publication venue
Publication date: 17/05/2023
Field of study

In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding Prompt-and-Refine strategy (Figure 3), two simple but effective \textbf{training-free} methods to decrease the Token Display Time (TDT) of streaming ASR models \textbf{without any accuracy loss}. The core idea of ZeroPrompt is to append zeroed content to each chunk during inference, which acts like a prompt to encourage the model to predict future tokens even before they were spoken. We argue that streaming acoustic encoders naturally have the modeling ability of Masked Language Models and our experiments demonstrate that ZeroPrompt is engineering cheap and can be applied to streaming acoustic encoders on any dataset without any accuracy loss. Specifically, compared with our baseline models, we achieve 350

\sim

700ms reduction on First Token Display Time (TDT-F) and 100

\sim

400ms reduction on Last Token Display Time (TDT-L), with theoretically and experimentally equal WER on both Aishell-1 and Librispeech datasets.Comment: accepted by interspeech 202

arXiv.org e-Print Archive

LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech

Author: Chen Jie
Pan Fuping
Peng Zhendong
Song Xingchen
Wu Zhiyong
Zhang Binbin
Publication venue
Publication date: 31/08/2023
Field of study

Recent advances in neural text-to-speech (TTS) models bring thousands of TTS applications into daily life, where models are deployed in cloud to provide services for customs. Among these models are diffusion probabilistic models (DPMs), which can be stably trained and are more parameter-efficient compared with other generative models. As transmitting data between customs and the cloud introduces high latency and the risk of exposing private data, deploying TTS models on edge devices is preferred. When implementing DPMs onto edge devices, there are two practical problems. First, current DPMs are not lightweight enough for resource-constrained devices. Second, DPMs require many denoising steps in inference, which increases latency. In this work, we present LightGrad, a lightweight DPM for TTS. LightGrad is equipped with a lightweight U-Net diffusion decoder and a training-free fast sampling technique, reducing both model parameters and inference latency. Streaming inference is also implemented in LightGrad to reduce latency further. Compared with Grad-TTS, LightGrad achieves 62.2% reduction in paramters, 65.7% reduction in latency, while preserving comparable speech quality on both Chinese Mandarin and English in 4 denoising steps.Comment: Accepted by ICASSP 202

arXiv.org e-Print Archive