4 research outputs found
What Do Patients Say About Their Disease Symptoms? Deep Multilabel Text Classification With Human-in-the-Loop Curation for Automatic Labeling of Patient Self Reports of Problems
The USA Food and Drug Administration has accorded increasing importance to
patient-reported problems in clinical and research settings. In this paper, we
explore one of the largest online datasets comprising 170,141 open-ended
self-reported responses (called "verbatims") from patients with Parkinson's
(PwPs) to questions about what bothers them about their Parkinson's Disease and
how it affects their daily functioning, also known as the Parkinson's Disease
Patient Report of Problems. Classifying such verbatims into multiple clinically
relevant symptom categories is an important problem and requires multiple steps
- expert curation, a multi-label text classification (MLTC) approach and large
amounts of labelled training data. Further, human annotation of such large
datasets is tedious and expensive. We present a novel solution to this problem
where we build a baseline dataset using 2,341 (of the 170,141) verbatims
annotated by nine curators including clinical experts and PwPs. We develop a
rules based linguistic-dictionary using NLP techniques and graph database-based
expert phrase-query system to scale the annotation to the remaining cohort
generating the machine annotated dataset, and finally build a Keras-Tensorflow
based MLTC model for both datasets. The machine annotated model significantly
outperforms the baseline model with a F1-score of 95% across 65 symptom
categories on a held-out test set
Recommended from our members
What Patients Say: Large-Scale Analyses of Replies to the Parkinsons Disease Patient Report of Problems (PD-PROP).
BACKGROUND: Free-text, verbatim replies in the words of people with Parkinsons disease (PD) have the potential to provide unvarnished information about their feelings and experiences. Challenges of processing such data on a large scale are a barrier to analyzing verbatim data collection in large cohorts. OBJECTIVE: To develop a method for curating responses from the Parkinsons Disease Patient Report of Problems (PD-PROP), open-ended questions that asks people with PD to report their most bothersome problems and associated functional consequences. METHODS: Human curation, natural language processing, and machine learning were used to develop an algorithm to convert verbatim responses to classified symptoms. Nine curators including clinicians, people with PD, and a non-clinician PD expert classified a sample of responses as reporting each symptom or not. Responses to the PD-PROP were collected within the Fox Insight cohort study. RESULTS: Approximately 3,500 PD-PROP responses were curated by a human team. Subsequently, approximately 1,500 responses were used in the validation phase; median age of respondents was 67 years, 55% were men and median years since PD diagnosis was 3 years. 168,260 verbatim responses were classified by machine. Accuracy of machine classification was 95% on a held-out test set. 65 symptoms were grouped into 14 domains. The most frequently reported symptoms at first report were tremor (by 46% of respondents), gait and balance problems (>39%), and pain/discomfort (33%). CONCLUSION: A human-in-the-loop method of curation provides both accuracy and efficiency, permitting a clinically useful analysis of large datasets of verbatim reports about the problems that bother PD patients