12 research outputs found

    Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering

    Full text link
    Medical visual question answering (VQA) is a challenging task that requires answering clinical questions of a given medical image, by taking consider of both visual and language information. However, due to the small scale of training data for medical VQA, pre-training fine-tuning paradigms have been a commonly used solution to improve model generalization performance. In this paper, we present a novel self-supervised approach that learns unimodal and multimodal feature representations of input images and text using medical image caption datasets, by leveraging both unimodal and multimodal contrastive losses, along with masked language modeling and image text matching as pretraining objectives. The pre-trained model is then transferred to downstream medical VQA tasks. The proposed approach achieves state-of-the-art (SOTA) performance on three publicly available medical VQA datasets with significant accuracy improvements of 2.2%, 14.7%, and 1.7% respectively. Besides, we conduct a comprehensive analysis to validate the effectiveness of different components of the approach and study different pre-training settings. Our codes and models are available at https://github.com/pengfeiliHEU/MUMC.Comment: accepted by MICCAI202

    Self-supervised vision-language pretraining for Medical visual question answering

    Full text link
    Medical image visual question answering (VQA) is a task to answer clinical questions, given a radiographic image, which is a challenging problem that requires a model to integrate both vision and language information. To solve medical VQA problems with a limited number of training data, pretrain-finetune paradigm is widely used to improve the model generalization. In this paper, we propose a self-supervised method that applies Masked image modeling, Masked language modeling, Image text matching and Image text alignment via contrastive learning (M2I2) for pretraining on medical image caption dataset, and finetunes to downstream medical VQA tasks. The proposed method achieves state-of-the-art performance on all the three public medical VQA datasets. Our codes and models are available at https://github.com/pengfeiliHEU/M2I2.Comment: 5 pages, 3 figure

    Accelerations of structural and functional brain connectivity using heterogeneous computing

    No full text
    In this thesis, the main aims are to accelerate algorithms in diffusion tractography and functional MRI connectivity analysis, by mapping them on parallel architectures. Diffusion Tractography is an algorithm for studying micro-structure of brain white matter (WM), and functional MRI is a neuroimaging procedure to explore the time series of brain activities. Both of these algorithms are widely applied in neuroscience researches. Diffusion-weighted (DW) magnetic resonance (MR) images can be processed to yield orientation information on the underlying anisotropic distribution of elongated fibers, such as in the white matter of the brain. As MR field strengths increase and scan protocols are improved, the spatial and angular resolution that can be achieved have reached the point where traditional diffusion tensor imaging (DTI) methods are being replaced in favour of non-tensor methods, so as to allow for multiple fiber directions per image voxel. Once the fiber distribution is known, the generation of whole streamlines - stochastic representations of white matter fiber bundles - while computationally dense, is intrinsically parallel and is eminently suited to acceleration with the contemporary graphical processing unit~(GPU). Here, we report the design, implementation, validation and performance analysis of two different parallel mappings of the standard probabilistic tracking algorithm that is applied to DW MR images represented in spherical harmonics. We achieve a 10x speedup on a commodity GPU, compared to the standard multi-core CPU implementation, while recovering the expected distribution of the streamlines. Our parallel implementation scales well across different hardware and problem sizes. The best rate achieved is one million streamlines computed in less than 20 seconds. The work on the accelerated sliding-window-based spatial ICA components tracking on functional MRI time-series data is also main part of our project, which is a more flexible approach of studying brain functional dynamics than traditional Region-of-Interests (ROI)-based analysis. In order to perform real-time ICA processes, we are investigating into the parallel mapping and implementations of the FastICA algorithm on GPU(s) to achieve less than 2 seconds (i.e. the sampling speed of the current scanning for fMRI sequences). This may open new possibilities of performing real-time neurofeedback studies and intraoperative image-guided neurosurgery

    Deep Learning in Industry

    No full text
    In this presentation, I will be talking about the industry use case of deep learning, particully in the production pipeline. Rather than focusing on the model building and training, this talk will cover more about the serving part of the trained model and the way to scale up/down using the frameworks like tensorflow. The other part of the talk will talk about the AI trends (e.g. deep learning applications) in one of the particular domain, medical imaging

    Auto-encoded latent representations of white matter streamlines for quantitative distance analysis

    No full text
    Parcellation of whole brain tractograms is a critical step to study brain white matter structures and connectivity patterns. The existing methods based on supervised classification of streamlines into predefined streamline bundle types are not designed to explore sub-bundle structures, and methods with manually designed features are expensive to compute streamline-wise similarities. To resolve these issues, we propose a novel atlas-free method that learns a latent space using a deep recurrent auto-encoder trained in an unsupervised manner. The method efficiently embeds any length of streamlines to fixed-size feature vectors, named streamline embedding, for tractogram parcellation using non-parametric clustering in the latent space. The method was evaluated on the ISMRM 2015 tractography challenge dataset with discrimination of major bundles using clustering algorithms and streamline querying based on similarity, as well as real tractograms of 102 subjects Human Connectome Project. The learnt latent streamline and bundle representations open the possibility of quantitative studies of arbitrary granularity of sub-bundle structures using generic data mining techniques

    Monash DaCRA fPET-fMRI:a dataset for comparison of radiotracer administration for high temporal resolution functional FDG-PET

    No full text
    BACKGROUND: “Functional” [(18)F]-fluorodeoxyglucose positron emission tomography (FDG-fPET) is a new approach for measuring glucose uptake in the human brain. The goal of FDG-fPET is to maintain a constant plasma supply of radioactive FDG in order to track, with high temporal resolution, the dynamic uptake of glucose during neuronal activity that occurs in response to a task or at rest. FDG-fPET has most often been applied in simultaneous BOLD-fMRI/FDG-fPET (blood oxygenation level–dependent functional MRI fluorodeoxyglucose functional positron emission tomography) imaging. BOLD-fMRI/FDG-fPET provides the capability to image the 2 primary sources of energetic dynamics in the brain, the cerebrovascular haemodynamic response and cerebral glucose uptake. FINDINGS: In this Data Note, we describe an open access dataset, Monash DaCRA fPET-fMRI, which contrasts 3 radiotracer administration protocols for FDG-fPET: bolus, constant infusion, and hybrid bolus/infusion. Participants (n = 5 in each group) were randomly assigned to each radiotracer administration protocol and underwent simultaneous BOLD-fMRI/FDG-fPET scanning while viewing a flickering checkerboard. The bolus group received the full FDG dose in a standard bolus administration, the infusion group received the full FDG dose as a slow infusion over the duration of the scan, and the bolus-infusion group received 50% of the FDG dose as bolus and 50% as constant infusion. We validate the dataset by contrasting plasma radioactivity, grey matter mean uptake, and task-related activity in the visual cortex. CONCLUSIONS: The Monash DaCRA fPET-fMRI dataset provides significant reuse value for researchers interested in the comparison of signal dynamics in fPET, and its relationship with fMRI task-evoked activity

    Task-evoked simultaneous FDG-PET and fMRI data for measurement of neural metabolism in the human visual cortex

    No full text
    Understanding how the living human brain functions requires sophisticated in vivo neuroimaging technologies to characterise the complexity of neuroanatomy, neural function, and brain metabolism. Fluorodeoxyglucose positron emission tomography (FDG-PET) studies of human brain function have historically been limited in their capacity to measure dynamic neural activity. Simultaneous [18 F]-FDG-PET and functional magnetic resonance imaging (fMRI) with FDG infusion protocols enable examination of dynamic changes in cerebral glucose metabolism simultaneously with dynamic changes in blood oxygenation. The Monash vis-fPET-fMRI dataset is a simultaneously acquired FDG-fPET/BOLD-fMRI dataset acquired from n = 10 healthy adults (18–49 yrs) whilst they viewed a flickering checkerboard task. The dataset contains both raw (unprocessed) images and source data organized according to the BIDS specification. The source data includes PET listmode, normalization, sinogram and physiology data. Here, the technical feasibility of using opensource frameworks to reconstruct the PET listmode data is demonstrated. The dataset has significant re-use value for the development of new processing pipelines, signal optimisation methods, and to formulate new hypotheses concerning the relationship between neuronal glucose uptake and cerebral haemodynamics
    corecore