Search CORE

7 research outputs found

EROS: Entity-Driven Controlled Policy Document Summarization

Author: Akhtar Md Shad
Fazili Sehban
Jain Rohan
Singh Joykirat
Publication venue
Publication date: 29/02/2024
Field of study

Privacy policy documents have a crucial role in educating individuals about the collection, usage, and protection of users' personal data by organizations. However, they are notorious for their lengthy, complex, and convoluted language especially involving privacy-related entities. Hence, they pose a significant challenge to users who attempt to comprehend organization's data usage policy. In this paper, we propose to enhance the interpretability and readability of policy documents by using controlled abstractive summarization -- we enforce the generated summaries to include critical privacy-related entities (e.g., data and medium) and organization's rationale (e.g.,target and reason) in collecting those entities. To achieve this, we develop PD-Sum, a policy-document summarization dataset with marked privacy-related entity labels. Our proposed model, EROS, identifies critical entities through a span-based entity extraction model and employs them to control the information content of the summaries using proximal policy optimization (PPO). Comparison shows encouraging improvement over various baselines. Furthermore, we furnish qualitative and human evaluations to establish the efficacy of EROS.Comment: Accepted in LREC-COLING 202

arXiv.org e-Print Archive

Almanac: Retrieval-Augmented Language Models for Clinical Medicine

Author: Alexander Kevin
Ashley Euan
Boyd Jack
Boyd Kathleen
Chaurasia Akash
Dalal Alex R.
Hiesinger William
Hirsch Karen
Kim Jennifer L.
Langlotz Curt
Moor Michael
Nelson Joanna
Shad Rohan
Zakka Cyril
Publication venue
Publication date: 31/05/2023
Field of study

Large-language models have recently demonstrated impressive zero-shot capabilities in a variety of natural language tasks such as summarization, dialogue generation, and question-answering. Despite many promising applications in clinical medicine, adoption of these models in real-world settings has been largely limited by their tendency to generate incorrect and sometimes even toxic statements. In this study, we develop Almanac, a large language model framework augmented with retrieval capabilities for medical guideline and treatment recommendations. Performance on a novel dataset of clinical scenarios (n = 130) evaluated by a panel of 5 board-certified and resident physicians demonstrates significant increases in factuality (mean of 18% at p-value < 0.05) across all specialties, with improvements in completeness and safety. Our results demonstrate the potential for large language models to be effective tools in the clinical decision-making process, while also emphasizing the importance of careful testing and deployment to mitigate their shortcomings

arXiv.org e-Print Archive

A Generalizable Deep Learning System for Cardiac MRI

Author: Acker Michael A.
Ashley Euan
de Feria Alejandro
Eng David
Ferrari Victor
Filice Ross Warren
Fong Robyn
Hiesinger William
Kalianos Kimberly
Kaur Dhamanpreet
Khandwala Nishith
Langlotz Curtis
Leipzig Matthew
Mongan John
Shad Rohan
Witschey Walter
Zakka Cyril
Publication venue
Publication date: 01/12/2023
Field of study

Cardiac MRI allows for a comprehensive assessment of myocardial structure, function, and tissue characteristics. Here we describe a foundational vision system for cardiac MRI, capable of representing the breadth of human cardiovascular disease and health. Our deep learning model is trained via self-supervised contrastive learning, by which visual concepts in cine-sequence cardiac MRI scans are learned from the raw text of the accompanying radiology reports. We train and evaluate our model on data from four large academic clinical institutions in the United States. We additionally showcase the performance of our models on the UK BioBank, and two additional publicly available external datasets. We explore emergent zero-shot capabilities of our system, and demonstrate remarkable performance across a range of tasks; including the problem of left ventricular ejection fraction regression, and the diagnosis of 35 different conditions such as cardiac amyloidosis and hypertrophic cardiomyopathy. We show that our deep learning system is capable of not only understanding the staggering complexity of human cardiovascular disease, but can be directed towards clinical problems of interest yielding impressive, clinical grade diagnostic accuracy with a fraction of the training data typically required for such tasks.Comment: 21 page main manuscript, 4 figures. Supplementary Appendix and code will be made available on publicatio

arXiv.org e-Print Archive