6 research outputs found
Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary Edema Assessment
We propose and demonstrate a novel machine learning algorithm that assesses
pulmonary edema severity from chest radiographs. While large publicly available
datasets of chest radiographs and free-text radiology reports exist, only
limited numerical edema severity labels can be extracted from radiology
reports. This is a significant challenge in learning such models for image
classification. To take advantage of the rich information present in the
radiology reports, we develop a neural network model that is trained on both
images and free-text to assess pulmonary edema severity from chest radiographs
at inference time. Our experimental results suggest that the joint image-text
representation learning improves the performance of pulmonary edema assessment
compared to a supervised model trained on images only. We also show the use of
the text for explaining the image classification by the joint model. To the
best of our knowledge, our approach is the first to leverage free-text
radiology reports for improving the image model performance in this
application. Our code is available at
https://github.com/RayRuizhiLiao/joint_chestxray.Comment: The two first authors contributed equally. To be published in the
proceedings of MICCAI 202
Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms
Question categorization and expert retrieval methods have been crucial for
information organization and accessibility in community question & answering
(CQA) platforms. Research in this area, however, has dealt with only the text
modality. With the increasing multimodal nature of web content, we focus on
extending these methods for CQA questions accompanied by images. Specifically,
we leverage the success of representation learning for text and images in the
visual question answering (VQA) domain, and adapt the underlying concept and
architecture for automated category classification and expert retrieval on
image-based questions posted on Yahoo! Chiebukuro, the Japanese counterpart of
Yahoo! Answers.
To the best of our knowledge, this is the first work to tackle the
multimodality challenge in CQA, and to adapt VQA models for tasks on a more
ecologically valid source of visual questions. Our analysis of the differences
between visual QA and community QA data drives our proposal of novel
augmentations of an attention method tailored for CQA, and use of auxiliary
tasks for learning better grounding features. Our final model markedly
outperforms the text-only and VQA model baselines for both tasks of
classification and expert retrieval on real-world multimodal CQA data.Comment: Submitted for review at CIKM 201
Developing ChatGPT for Biology and Medicine: A Complete Review of Biomedical Question Answering
ChatGPT explores a strategic blueprint of question answering (QA) in
delivering medical diagnosis, treatment recommendations, and other healthcare
support. This is achieved through the increasing incorporation of medical
domain data via natural language processing (NLP) and multimodal paradigms. By
transitioning the distribution of text, images, videos, and other modalities
from the general domain to the medical domain, these techniques have expedited
the progress of medical domain question answering (MDQA). They bridge the gap
between human natural language and sophisticated medical domain knowledge or
expert manual annotations, handling large-scale, diverse, unbalanced, or even
unlabeled data analysis scenarios in medical contexts. Central to our focus is
the utilizing of language models and multimodal paradigms for medical question
answering, aiming to guide the research community in selecting appropriate
mechanisms for their specific medical research requirements. Specialized tasks
such as unimodal-related question answering, reading comprehension, reasoning,
diagnosis, relation extraction, probability modeling, and others, as well as
multimodal-related tasks like vision question answering, image caption,
cross-modal retrieval, report summarization, and generation, are discussed in
detail. Each section delves into the intricate specifics of the respective
method under consideration. This paper highlights the structures and
advancements of medical domain explorations against general domain methods,
emphasizing their applications across different tasks and datasets. It also
outlines current challenges and opportunities for future medical domain
research, paving the way for continued innovation and application in this
rapidly evolving field.Comment: 50 pages, 3 figures, 3 table
Joint Image-Text Representation Learning
It was a dream to make computers intelligent. Like humans who are capable of understanding information of multiple modalities such as video, text, audio, etc., teaching computers to jointly understand multi-modal information is a necessary and essential step towards artificial intelligence. And how to jointly represent multi-modal information is critical to such step. Although a lot of efforts have been devoted to exploring the representation of each modality individually, it is an open and challenging problem to learn joint multi-modal representation.In this dissertation, we explore joint image-text representation models based on Visual-Semantic Embedding (VSE). VSE has been recently proposed and shown to be effective for joint representation. The key idea is that by learning a mapping from images into a semantic space, the algorithm is able to learn a compact and effective joint representation. However, existing approaches simply map each text concept and each whole image to single points in the semantic space. We propose several novel visual-semantic embedding models that use (1) text concept modeling, (2) image-level modeling, and (3) object-level modeling. In particular, we first introduce a novel Gaussian Visual-Semantic Embedding (GVSE) model that leverages the visual information to model text concepts as density distributions rather than single points in semantic space. Then, we propose Multiple Instance Visual-Semantic Embedding (MIVSE) via image-level modeling, which discovers and maps the semantically meaningful image sub-regions to their corresponding text labels. Next, we present a fine-grained object-level representation in images, Scene-Domain Active Part Models (SDAPM), that reconstructs and characterizes 3D geometric statistics between object’s parts in 3D scene-domain. Finally, we explore advanced joint representations for other visual and textual modalities, including joint image-sentence representation and joint video-sentence representation.Extensive experiments have demonstrated that the proposed joint representation models are superior to existing methods on various tasks involving image, video and text modalities, including image annotation, zero-shot learning, object and parts detection, pose and viewpoint estimation, image classification, text-based image retrieval, image captioning, video annotation, and text-based video retrieval
Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary Edema Assessment
We propose and demonstrate a novel machine learning algorithm that assesses pulmonary edema severity from chest radiographs. While large publicly available datasets of chest radiographs and free-text radiology reports exist, only limited numerical edema severity labels can be extracted from radiology reports. This is a significant challenge in learning such models for image classification. To take advantage of the rich information present in the radiology reports, we develop a neural network model that is trained on both images and free-text to assess pulmonary edema severity from chest radiographs at inference time. Our experimental results suggest that the joint image-text representation learning improves the performance of pulmonary edema assessment compared to a supervised model trained on images only. We also show the use of the text for explaining the image classification by the joint model. To the best of our knowledge, our approach is the first to leverage free-text radiology reports for improving the image model performance in this application. Our code is available at: https://github.com/RayRuizhiLiao/joint_chestxray.NIH/NIBIB/NAC (Grant P41EB015902