INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and
  Prognosis

Chiang, Chia-Chun; Fries, Jason A.; Huang, Shih-Cheng; Huo, Zepeng; Langlotz, Curtis P.; Lungren, Matthew P.; Shah, Nigam H.; Steinberg, Ethan; Yeung, Serena

INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis

Authors: Chia-Chun Chiang
Jason A. Fries
Shih-Cheng Huang
Zepeng Huo
Curtis P. Langlotz
Matthew P. Lungren
Nigam H. Shah
Ethan Steinberg
Serena Yeung
Publication date: 17 November 2023
Publisher

Abstract

Synthesizing information from multiple data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of patients at risk for pulmonary embolism (PE), along with ground truth labels for multiple outcomes. INSPECT contains data from 19,402 patients, including CT images, radiology report impression sections, and structured electronic health record (EHR) data (i.e. demographics, diagnoses, procedures, vitals, and medications). Using INSPECT, we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks. We evaluate image-only, EHR-only, and multimodal fusion models. Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement. To the best of our knowledge, INSPECT is the largest multimodal dataset integrating 3D medical imaging and EHR for reproducible methods evaluation and research

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2311.10798

Last time updated on 07/05/2024