50 research outputs found
Malaria Likelihood Prediction By Effectively Surveying Households Using Deep Reinforcement Learning
We build a deep reinforcement learning (RL) agent that can predict the
likelihood of an individual testing positive for malaria by asking questions
about their household. The RL agent learns to determine which survey question
to ask next and when to stop to make a prediction about their likelihood of
malaria based on their responses hitherto. The agent incurs a small penalty for
each question asked, and a large reward/penalty for making the correct/wrong
prediction; it thus has to learn to balance the length of the survey with the
accuracy of its final predictions. Our RL agent is a Deep Q-network that learns
a policy directly from the responses to the questions, with an action defined
for each possible survey question and for each possible prediction class. We
focus on Kenya, where malaria is a massive health burden, and train the RL
agent on a dataset of 6481 households from the Kenya Malaria Indicator Survey
2015. To investigate the importance of having survey questions be adaptive to
responses, we compare our RL agent to a supervised learning (SL) baseline that
fixes its set of survey questions a priori. We evaluate on prediction accuracy
and on the number of survey questions asked on a holdout set and find that the
RL agent is able to predict with 80% accuracy, using only 2.5 questions on
average. In addition, the RL agent learns to survey adaptively to responses and
is able to match the SL baseline in prediction accuracy while significantly
reducing survey length.Comment: Accepted at NIPS 2017 Workshop on Machine Learning for Health (NIPS
2017 ML4H
Know What You Don't Know: Unanswerable Questions for SQuAD
Extractive reading comprehension systems can often locate the correct answer
to a question in a context document, but they also tend to make unreliable
guesses on questions for which the correct answer is not stated in the context.
Existing datasets either focus exclusively on answerable questions, or use
automatically generated unanswerable questions that are easy to identify. To
address these weaknesses, we present SQuAD 2.0, the latest version of the
Stanford Question Answering Dataset (SQuAD). SQuAD 2.0 combines existing SQuAD
data with over 50,000 unanswerable questions written adversarially by
crowdworkers to look similar to answerable ones. To do well on SQuAD 2.0,
systems must not only answer questions when possible, but also determine when
no answer is supported by the paragraph and abstain from answering. SQuAD 2.0
is a challenging natural language understanding task for existing models: a
strong neural system that gets 86% F1 on SQuAD 1.1 achieves only 66% F1 on
SQuAD 2.0.Comment: ACL 201
SQuAD: 100,000+ Questions for Machine Comprehension of Text
We present the Stanford Question Answering Dataset (SQuAD), a new reading
comprehension dataset consisting of 100,000+ questions posed by crowdworkers on
a set of Wikipedia articles, where the answer to each question is a segment of
text from the corresponding reading passage. We analyze the dataset to
understand the types of reasoning required to answer the questions, leaning
heavily on dependency and constituency trees. We build a strong logistic
regression model, which achieves an F1 score of 51.0%, a significant
improvement over a simple baseline (20%). However, human performance (86.8%) is
much higher, indicating that the dataset presents a good challenge problem for
future research.
The dataset is freely available at https://stanford-qa.comComment: To appear in Proceedings of the 2016 Conference on Empirical Methods
in Natural Language Processing (EMNLP
Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks
We develop an algorithm which exceeds the performance of board certified
cardiologists in detecting a wide range of heart arrhythmias from
electrocardiograms recorded with a single-lead wearable monitor. We build a
dataset with more than 500 times the number of unique patients than previously
studied corpora. On this dataset, we train a 34-layer convolutional neural
network which maps a sequence of ECG samples to a sequence of rhythm classes.
Committees of board-certified cardiologists annotate a gold standard test set
on which we compare the performance of our model to that of 6 other individual
cardiologists. We exceed the average cardiologist performance in both recall
(sensitivity) and precision (positive predictive value)
CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT
The extraction of labels from radiology text reports enables large-scale
training of medical imaging models. Existing approaches to report labeling
typically rely either on sophisticated feature engineering based on medical
domain knowledge or manual annotations by experts. In this work, we introduce a
BERT-based approach to medical image report labeling that exploits both the
scale of available rule-based systems and the quality of expert annotations. We
demonstrate superior performance of a biomedically pretrained BERT model first
trained on annotations of a rule-based labeler and then finetuned on a small
set of expert annotations augmented with automated backtranslation. We find
that our final model, CheXbert, is able to outperform the previous best
rules-based labeler with statistical significance, setting a new SOTA for
report labeling on one of the largest datasets of chest x-rays
Effect of Radiology Report Labeler Quality on Deep Learning Models for Chest X-Ray Interpretation
Although deep learning models for chest X-ray interpretation are commonly
trained on labels generated by automatic radiology report labelers, the impact
of improvements in report labeling on the performance of chest X-ray
classification models has not been systematically investigated. We first
compare the CheXpert, CheXbert, and VisualCheXbert labelers on the task of
extracting accurate chest X-ray image labels from radiology reports, reporting
that the VisualCheXbert labeler outperforms the CheXpert and CheXbert labelers.
Next, after training image classification models using labels generated from
the different radiology report labelers on one of the largest datasets of chest
X-rays, we show that an image classification model trained on labels from the
VisualCheXbert labeler outperforms image classification models trained on
labels from the CheXpert and CheXbert labelers. Our work suggests that recent
improvements in radiology report labeling can translate to the development of
higher performing chest X-ray classification models.Comment: In Neural Information Processing Systems (NeurIPS) Workshop on
Data-Centric AI (DCAI
MoCo-CXR: MoCo Pretraining Improves Representation and Transferability of Chest X-ray Models
Contrastive learning is a form of self-supervision that can leverage
unlabeled data to produce pretrained models. While contrastive learning has
demonstrated promising results on natural image classification tasks, its
application to medical imaging tasks like chest X-ray interpretation has been
limited. In this work, we propose MoCo-CXR, which is an adaptation of the
contrastive learning method Momentum Contrast (MoCo), to produce models with
better representations and initializations for the detection of pathologies in
chest X-rays. In detecting pleural effusion, we find that linear models trained
on MoCo-CXR-pretrained representations outperform those without
MoCo-CXR-pretrained representations, indicating that MoCo-CXR-pretrained
representations are of higher-quality. End-to-end fine-tuning experiments
reveal that a model initialized via MoCo-CXR-pretraining outperforms its
non-MoCo-CXR-pretrained counterpart. We find that MoCo-CXR-pretraining provides
the most benefit with limited labeled training data. Finally, we demonstrate
similar results on a target Tuberculosis dataset unseen during pretraining,
indicating that MoCo-CXR-pretraining endows models with representations and
transferability that can be applied across chest X-ray datasets and tasks
Driverseat: Crowdstrapping Learning Tasks for Autonomous Driving
While emerging deep-learning systems have outclassed knowledge-based
approaches in many tasks, their application to detection tasks for autonomous
technologies remains an open field for scientific exploration. Broadly, there
are two major developmental bottlenecks: the unavailability of comprehensively
labeled datasets and of expressive evaluation strategies. Approaches for
labeling datasets have relied on intensive hand-engineering, and strategies for
evaluating learning systems have been unable to identify failure-case
scenarios. Human intelligence offers an untapped approach for breaking through
these bottlenecks. This paper introduces Driverseat, a technology for embedding
crowds around learning systems for autonomous driving. Driverseat utilizes
crowd contributions for (a) collecting complex 3D labels and (b) tagging
diverse scenarios for ready evaluation of learning systems. We demonstrate how
Driverseat can crowdstrap a convolutional neural network on the lane-detection
task. More generally, crowdstrapping introduces a valuable paradigm for any
technology that can benefit from leveraging the powerful combination of human
and computer intelligence
CheXseg: Combining Expert Annotations with DNN-generated Saliency Maps for X-ray Segmentation
Medical image segmentation models are typically supervised by expert
annotations at the pixel-level, which can be expensive to acquire. In this
work, we propose a method that combines the high quality of pixel-level expert
annotations with the scale of coarse DNN-generated saliency maps for training
multi-label semantic segmentation models. We demonstrate the application of our
semi-supervised method, which we call CheXseg, on multi-label chest x-ray
interpretation. We find that CheXseg improves upon the performance (mIoU) of
fully-supervised methods that use only pixel-level expert annotations by 13.4%
and weakly-supervised methods that use only DNN-generated saliency maps by
91.2%. Furthermore, we implement a semi-supervised method using knowledge
distillation and find that though it is outperformed by CheXseg, it exceeds the
performance (mIoU) of the best fully-supervised method by 4.83%. Our best
method is able to match radiologist agreement on three out of ten pathologies
and reduces the overall performance gap by 71.6% as compared to
weakly-supervised methods
CheXseen: Unseen Disease Detection for Deep Learning Interpretation of Chest X-rays
We systematically evaluate the performance of deep learning models in the
presence of diseases not labeled for or present during training. First, we
evaluate whether deep learning models trained on a subset of diseases (seen
diseases) can detect the presence of any one of a larger set of diseases. We
find that models tend to falsely classify diseases outside of the subset
(unseen diseases) as "no disease". Second, we evaluate whether models trained
on seen diseases can detect seen diseases when co-occurring with diseases
outside the subset (unseen diseases). We find that models are still able to
detect seen diseases even when co-occurring with unseen diseases. Third, we
evaluate whether feature representations learned by models may be used to
detect the presence of unseen diseases given a small labeled set of unseen
diseases. We find that the penultimate layer of the deep neural network
provides useful features for unseen disease detection. Our results can inform
the safe clinical deployment of deep learning models trained on a
non-exhaustive set of disease classes.Comment: Accepted to ACM Conference on Health, Inference, and Learning
(ACM-CHIL) Workshop 202