11 research outputs found

    Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models

    Get PDF
    Medical Visual Question Answering (VQA) is an important challenge, as it would lead to faster and more accurate diagnoses and treatment decisions. Most existing methods approach it as a multi-class classification problem, which restricts the outcome to a predefined closed-set of curated answers. We focus on open-ended VQA and motivated by the recent advances in language models consider it as a generative task. Leveraging pre-trained language models, we introduce a novel method particularly suited for small, domain-specific, medical datasets. To properly communicate the medical images to the language model, we develop a network that maps the extracted visual features to a set of learnable tokens. Then, alongside the question, these learnable tokens directly prompt the language model. We explore recent parameter-efficient fine-tuning strategies for language models, which allow for resource- and data-efficient fine-tuning. We evaluate our approach on the prime medical VQA benchmarks, namely, Slake, OVQA and PathVQA. The results demonstrate that our approach outperforms existing methods across various training settings while also being computationally efficient.</p

    LifeLonger: A Benchmark for Continual Disease Classification

    Get PDF
    Deep learning models have shown a great effectiveness in recognition of findings in medical images. However, they cannot handle the ever-changing clinical environment, bringing newly annotated medical data from different sources. To exploit the incoming streams of data, these models would benefit largely from sequentially learning from new samples, without forgetting the previously obtained knowledge. In this paper we introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection, by applying existing state-of-the-art continual learning methods. In particular, we consider three continual learning scenarios, namely, task and class incremental learning and the newly defined cross-domain incremental learning. Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch, while cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge. We perform a thorough analysis of the performance and examine how the well-known challenges of continual learning, such as the catastrophic forgetting exhibit themselves in this setting. The encouraging results demonstrate that continual learning has a major potential to advance disease classification and to produce a more robust and efficient learning framework for clinical settings. The code repository, data partitions and baseline results for the complete benchmark will be made publicly available

    Uncertainty-aware report generation for chest X-rays by variational topic inference

    Get PDF
    Automating report generation for medical imaging promises to minimize labor and aid diagnosis in clinical practice. Deep learning algorithms have recently been shown to be capable of captioning natural photos. However, doing a similar thing for medical data, is difficult due to the variety in reports written by different radiologists with fluctuating levels of knowledge and experience. Current methods for automatic report generation tend to merely copy one of the training samples in the created report. To tackle this issue, we propose variational topic inference, a probabilistic approach for automatic chest X-ray report generation. Specifically, we introduce a probabilistic latent variable model where a latent variable defines a single topic. The topics are inferred in a conditional variational inference framework by aligning vision and language modalities in a latent space, with each topic governing the generation of one sentence in the report. We further adopt a visual attention module that enables the model to attend to different locations in the image while generating the descriptions. We conduct extensive experiments on two benchmarks, namely Indiana U. Chest X-rays and MIMIC-CXR. The results demonstrate that our proposed variational topic inference method can generate reports with novel sentence structure, rather than mere copies of reports used in training, while still achieving comparable performance to state-of-the-art methods in terms of standard language generation criteria

    Variational Topic Inference for Chest X-Ray Report Generation

    No full text
    Automating report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice. Recent work has shown that deep learning models can successfully caption natural images. However, learning from medical data is challenging due to the diversity and uncertainty inherent in the reports written by different radiologists with discrepant expertise and experience. To tackle these challenges, we propose variational topic inference for automatic report generation. Specifically, we introduce a set of topics as latent variables to guide sentence generation by aligning image and language modalities in a latent space. The topics are inferred in a conditional variational inference framework, with each topic governing the generation of a sentence in the report. Further, we adopt a visual attention module that enables the model to attend to different locations in the image and generate more informative descriptions. We conduct extensive experiments on two benchmarks, namely Indiana U. Chest X-rays and MIMIC-CXR. The results demonstrate that our proposed variational topic inference method can generate novel reports rather than mere copies of reports used in training, while still achieving comparable performance to state-of-the-art methods in terms of standard language generation criteria.Comment: To be published in the International Conference on Medical Image Computing and Computer Assisted Intervention 202

    GCNIllustrator: Illustrating the Effect of Hyperparameters on Graph Convolutional Networks

    Get PDF
    An increasing number of real-world applications are using graph-structured datasets, imposing challenges to existing machine learning algorithms. Graph Convolutional Networks (GCNs) are deep learning models, specifically designed to operate on graphs. One of the most tedious steps in training GCNs is the choice of the hyperparameters, especially since they exhibit unique properties compared to other neural models. Not only machine learning beginners, but also experienced practitioners often have difficulties to properly tune their models. We hypothesize that having a tool that visualizes the effect of hyperparameters choice on the performance can accelerate the model development and improve the understanding of these black-box models. Additionally, observing clusters of certain nodes helps to empirically understand how a given prediction was made due to the feature propagation step of GCNs. Therefore, this demo introduces GCNIllustrator - a web-based visual analytics tool for illustrating the effect of hyperparameters on the predictions in a citations graph
    corecore