11 research outputs found
Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models
Medical Visual Question Answering (VQA) is an important challenge, as it would lead to faster and more accurate diagnoses and treatment decisions. Most existing methods approach it as a multi-class classification problem, which restricts the outcome to a predefined closed-set of curated answers. We focus on open-ended VQA and motivated by the recent advances in language models consider it as a generative task. Leveraging pre-trained language models, we introduce a novel method particularly suited for small, domain-specific, medical datasets. To properly communicate the medical images to the language model, we develop a network that maps the extracted visual features to a set of learnable tokens. Then, alongside the question, these learnable tokens directly prompt the language model. We explore recent parameter-efficient fine-tuning strategies for language models, which allow for resource- and data-efficient fine-tuning. We evaluate our approach on the prime medical VQA benchmarks, namely, Slake, OVQA and PathVQA. The results demonstrate that our approach outperforms existing methods across various training settings while also being computationally efficient.</p
LifeLonger: A Benchmark for Continual Disease Classification
Deep learning models have shown a great effectiveness in recognition of
findings in medical images. However, they cannot handle the ever-changing
clinical environment, bringing newly annotated medical data from different
sources. To exploit the incoming streams of data, these models would benefit
largely from sequentially learning from new samples, without forgetting the
previously obtained knowledge. In this paper we introduce LifeLonger, a
benchmark for continual disease classification on the MedMNIST collection, by
applying existing state-of-the-art continual learning methods. In particular,
we consider three continual learning scenarios, namely, task and class
incremental learning and the newly defined cross-domain incremental learning.
Task and class incremental learning of diseases address the issue of
classifying new samples without re-training the models from scratch, while
cross-domain incremental learning addresses the issue of dealing with datasets
originating from different institutions while retaining the previously obtained
knowledge. We perform a thorough analysis of the performance and examine how
the well-known challenges of continual learning, such as the catastrophic
forgetting exhibit themselves in this setting. The encouraging results
demonstrate that continual learning has a major potential to advance disease
classification and to produce a more robust and efficient learning framework
for clinical settings. The code repository, data partitions and baseline
results for the complete benchmark will be made publicly available
Uncertainty-aware report generation for chest X-rays by variational topic inference
Automating report generation for medical imaging promises to minimize labor and aid diagnosis in clinical practice. Deep learning algorithms have recently been shown to be capable of captioning natural photos. However, doing a similar thing for medical data, is difficult due to the variety in reports written by different radiologists with fluctuating levels of knowledge and experience. Current methods for automatic report generation tend to merely copy one of the training samples in the created report. To tackle this issue, we propose variational topic inference, a probabilistic approach for automatic chest X-ray report generation. Specifically, we introduce a probabilistic latent variable model where a latent variable defines a single topic. The topics are inferred in a conditional variational inference framework by aligning vision and language modalities in a latent space, with each topic governing the generation of one sentence in the report. We further adopt a visual attention module that enables the model to attend to different locations in the image while generating the descriptions. We conduct extensive experiments on two benchmarks, namely Indiana U. Chest X-rays and MIMIC-CXR. The results demonstrate that our proposed variational topic inference method can generate reports with novel sentence structure, rather than mere copies of reports used in training, while still achieving comparable performance to state-of-the-art methods in terms of standard language generation criteria
Variational Topic Inference for Chest X-Ray Report Generation
Automating report generation for medical imaging promises to reduce workload
and assist diagnosis in clinical practice. Recent work has shown that deep
learning models can successfully caption natural images. However, learning from
medical data is challenging due to the diversity and uncertainty inherent in
the reports written by different radiologists with discrepant expertise and
experience. To tackle these challenges, we propose variational topic inference
for automatic report generation. Specifically, we introduce a set of topics as
latent variables to guide sentence generation by aligning image and language
modalities in a latent space. The topics are inferred in a conditional
variational inference framework, with each topic governing the generation of a
sentence in the report. Further, we adopt a visual attention module that
enables the model to attend to different locations in the image and generate
more informative descriptions. We conduct extensive experiments on two
benchmarks, namely Indiana U. Chest X-rays and MIMIC-CXR. The results
demonstrate that our proposed variational topic inference method can generate
novel reports rather than mere copies of reports used in training, while still
achieving comparable performance to state-of-the-art methods in terms of
standard language generation criteria.Comment: To be published in the International Conference on Medical Image
Computing and Computer Assisted Intervention 202
GCNIllustrator: Illustrating the Effect of Hyperparameters on Graph Convolutional Networks
An increasing number of real-world applications are using graph-structured datasets, imposing challenges to existing machine learning algorithms. Graph Convolutional Networks (GCNs) are deep learning models, specifically designed to operate on graphs. One of the most tedious steps in training GCNs is the choice of the hyperparameters, especially since they exhibit unique properties compared to other neural models. Not only machine learning beginners, but also experienced practitioners often have difficulties to properly tune their models. We hypothesize that having a tool that visualizes the effect of hyperparameters choice on the performance can accelerate the model development and improve the understanding of these black-box models. Additionally, observing clusters of certain nodes helps to empirically understand how a given prediction was made due to the feature propagation step of GCNs. Therefore, this demo introduces GCNIllustrator - a web-based visual analytics tool for illustrating the effect of hyperparameters on the predictions in a citations graph