7 research outputs found
EROS: Entity-Driven Controlled Policy Document Summarization
Privacy policy documents have a crucial role in educating individuals about
the collection, usage, and protection of users' personal data by organizations.
However, they are notorious for their lengthy, complex, and convoluted language
especially involving privacy-related entities. Hence, they pose a significant
challenge to users who attempt to comprehend organization's data usage policy.
In this paper, we propose to enhance the interpretability and readability of
policy documents by using controlled abstractive summarization -- we enforce
the generated summaries to include critical privacy-related entities (e.g.,
data and medium) and organization's rationale (e.g.,target and reason) in
collecting those entities. To achieve this, we develop PD-Sum, a
policy-document summarization dataset with marked privacy-related entity
labels. Our proposed model, EROS, identifies critical entities through a
span-based entity extraction model and employs them to control the information
content of the summaries using proximal policy optimization (PPO). Comparison
shows encouraging improvement over various baselines. Furthermore, we furnish
qualitative and human evaluations to establish the efficacy of EROS.Comment: Accepted in LREC-COLING 202
Almanac: Retrieval-Augmented Language Models for Clinical Medicine
Large-language models have recently demonstrated impressive zero-shot
capabilities in a variety of natural language tasks such as summarization,
dialogue generation, and question-answering. Despite many promising
applications in clinical medicine, adoption of these models in real-world
settings has been largely limited by their tendency to generate incorrect and
sometimes even toxic statements. In this study, we develop Almanac, a large
language model framework augmented with retrieval capabilities for medical
guideline and treatment recommendations. Performance on a novel dataset of
clinical scenarios (n = 130) evaluated by a panel of 5 board-certified and
resident physicians demonstrates significant increases in factuality (mean of
18% at p-value < 0.05) across all specialties, with improvements in
completeness and safety. Our results demonstrate the potential for large
language models to be effective tools in the clinical decision-making process,
while also emphasizing the importance of careful testing and deployment to
mitigate their shortcomings
A Generalizable Deep Learning System for Cardiac MRI
Cardiac MRI allows for a comprehensive assessment of myocardial structure,
function, and tissue characteristics. Here we describe a foundational vision
system for cardiac MRI, capable of representing the breadth of human
cardiovascular disease and health. Our deep learning model is trained via
self-supervised contrastive learning, by which visual concepts in cine-sequence
cardiac MRI scans are learned from the raw text of the accompanying radiology
reports. We train and evaluate our model on data from four large academic
clinical institutions in the United States. We additionally showcase the
performance of our models on the UK BioBank, and two additional publicly
available external datasets. We explore emergent zero-shot capabilities of our
system, and demonstrate remarkable performance across a range of tasks;
including the problem of left ventricular ejection fraction regression, and the
diagnosis of 35 different conditions such as cardiac amyloidosis and
hypertrophic cardiomyopathy. We show that our deep learning system is capable
of not only understanding the staggering complexity of human cardiovascular
disease, but can be directed towards clinical problems of interest yielding
impressive, clinical grade diagnostic accuracy with a fraction of the training
data typically required for such tasks.Comment: 21 page main manuscript, 4 figures. Supplementary Appendix and code
will be made available on publicatio