25 research outputs found
On Classifying Sepsis Heterogeneity in the ICU: Insight Using Machine Learning
Current machine learning models aiming to predict sepsis from Electronic
Health Records (EHR) do not account for the heterogeneity of the condition,
despite its emerging importance in prognosis and treatment. This work
demonstrates the added value of stratifying the types of organ dysfunction
observed in patients who develop sepsis in the ICU in improving the ability to
recognise patients at risk of sepsis from their EHR data. Using an ICU dataset
of 13,728 records, we identify clinically significant sepsis subpopulations
with distinct organ dysfunction patterns. Classification experiments using
Random Forest, Gradient Boost Trees and Support Vector Machines, aiming to
distinguish patients who develop sepsis in the ICU from those who do not, show
that features selected using sepsis subpopulations as background knowledge
yield a superior performance regardless of the classification model used. Our
findings can steer machine learning efforts towards more personalised models
for complex conditions including sepsis.Comment: 3 Figures and 2 tables. Accepted for publication at the Journal of
American Medical Informatics Associatio
Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records
Abstract Unknown adverse reactions to drugs available on the market present a significant health risk and limit accurate judgement of the cost/benefit trade-off for medications. Machine learning has the potential to predict unknown adverse reactions from current knowledge. We constructed a knowledge graph containing four types of node: drugs, protein targets, indications and adverse reactions. Using this graph, we developed a machine learning algorithm based on a simple enrichment test and first demonstrated this method performs extremely well at classifying known causes of adverse reactions (AUC 0.92). A cross validation scheme in which 10% of drug-adverse reaction edges were systematically deleted per fold showed that the method correctly predicts 68% of the deleted edges on average. Next, a subset of adverse reactions that could be reliably detected in anonymised electronic health records from South London and Maudsley NHS Foundation Trust were used to validate predictions from the model that are not currently known in public databases. High-confidence predictions were validated in electronic records significantly more frequently than random models, and outperformed standard methods (logistic regression, decision trees and support vector machines). This approach has the potential to improve patient safety by predicting adverse reactions that were not observed during randomised trials
ADEPt, a semantically-enriched pipeline for extracting adverse drug events from free-text electronic health records
Adverse drug events (ADEs) are unintended responses to medical treatment. They can greatly affect a patient's quality of life and present a substantial burden on healthcare. Although Electronic health records (EHRs) document a wealth of information relating to ADEs, they are frequently stored in the unstructured or semi-structured free-text narrative requiring Natural Language Processing (NLP) techniques to mine the relevant information. Here we present a rule-based ADE detection and classification pipeline built and tested on a large Psychiatric corpus comprising 264k patients using the de-identified EHRs of four UK-based psychiatric hospitals. The pipeline uses characteristics specific to Psychiatric EHRs to guide the annotation process, and distinguishes: a) the temporal value associated with the ADE mention (whether it is historical or present), b) the categorical value of the ADE (whether it is assertive, hypothetical, retrospective or a general discussion) and c) the implicit contextual value where the status of the ADE is deduced from surrounding indicators, rather than explicitly stated. We manually created the rulebase in collaboration with clinicians and pharmacists by studying ADE mentions in various types of clinical notes. We evaluated the open-source Adverse Drug Event annotation Pipeline (ADEPt) using 19 ADEs specific to antipsychotics and antidepressants medication. The ADEs chosen vary in severity, regularity and persistence. The average F-measure and accuracy achieved by our tool across all tested ADEs were 0.83 and 0.83 respectively. In addition to annotation power, the ADEPT pipeline presents an improvement to the state of the art context-discerning algorithm, ConText
The side effect profile of Clozapine in real world data of three large mental hospitals
Objective: Mining the data contained within Electronic Health Records (EHRs)
can potentially generate a greater understanding of medication effects in the
real world, complementing what we know from Randomised control trials (RCTs).
We Propose a text mining approach to detect adverse events and medication
episodes from the clinical text to enhance our understanding of adverse effects
related to Clozapine, the most effective antipsychotic drug for the management
of treatment-resistant schizophrenia, but underutilised due to concerns over
its side effects. Material and Methods: We used data from de-identified EHRs of
three mental health trusts in the UK (>50 million documents, over 500,000
patients, 2835 of which were prescribed Clozapine). We explored the prevalence
of 33 adverse effects by age, gender, ethnicity, smoking status and admission
type three months before and after the patients started Clozapine treatment. We
compared the prevalence of adverse effects with those reported in the Side
Effects Resource (SIDER) where possible. Results: Sedation, fatigue, agitation,
dizziness, hypersalivation, weight gain, tachycardia, headache, constipation
and confusion were amongst the highest recorded Clozapine adverse effect in the
three months following the start of treatment. Higher percentages of all
adverse effects were found in the first month of Clozapine therapy. Using a
significance level of (p< 0.05) out chi-square tests show a significant
association between most of the ADRs in smoking status and hospital admissions
and some in gender and age groups. Further, the data was combined from three
trusts, and chi-square tests were applied to estimate the average effect of
ADRs in each monthly interval. Conclusion: A better understanding of how the
drug works in the real world can complement clinical trials and precision
medicine
Efficient Reuse of Natural Language Processing Models for Phenotype-Mention Identification in Free-text Electronic Medical Records: A Phenotype Embedding Approach.
Background: Many efforts have been put into the use of automated approaches,
such as natural language processing (NLP), to mine or extract data from
free-text medical records to construct comprehensive patient profiles for
delivering better health-care. Reusing NLP models in new settings, however,
remains cumbersome - requiring validation and/or retraining on new data
iteratively to achieve convergent results.
Objective: The aim of this work is to minimize the effort involved in reusing
NLP models on free-text medical records.
Methods: We formally define and analyse the model adaptation problem in
phenotype-mention identification tasks. We identify "duplicate waste" and
"imbalance waste", which collectively impede efficient model reuse. We propose
a phenotype embedding based approach to minimize these sources of waste without
the need for labelled data from new settings.
Results: We conduct experiments on data from a large mental health registry
to reuse NLP models in four phenotype-mention identification tasks. The
proposed approach can choose the best model for a new task, identifying up to
76% (duplicate waste), i.e. phenotype mentions without the need for validation
and model retraining, and with very good performance (93-97% accuracy). It can
also provide guidance for validating and retraining the selected model for
novel language patterns in new tasks, saving around 80% (imbalance waste), i.e.
the effort required in "blind" model-adaptation approaches.
Conclusions: Adapting pre-trained NLP models for new tasks can be more
efficient and effective if the language pattern landscapes of old settings and
new settings can be made explicit and comparable. Our experiments show that the
phenotype-mention embedding approach is an effective way to model language
patterns for phenotype-mention identification tasks and that its use can guide
efficient NLP model reuse
SemEHR:A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research
OBJECTIVE: Unlocking the data contained within both structured and unstructured components of electronic health records (EHRs) has the potential to provide a step change in data available for secondary research use, generation of actionable medical insights, hospital management, and trial recruitment. To achieve this, we implemented SemEHR, an open source semantic search and analytics tool for EHRs. METHODS: SemEHR implements a generic information extraction (IE) and retrieval infrastructure by identifying contextualized mentions of a wide range of biomedical concepts within EHRs. Natural language processing annotations are further assembled at the patient level and extended with EHR-specific knowledge to generate a timeline for each patient. The semantic data are serviced via ontology-based search and analytics interfaces. RESULTS: SemEHR has been deployed at a number of UK hospitals, including the Clinical Record Interactive Search, an anonymized replica of the EHR of the UK South London and Maudsley National Health Service Foundation Trust, one of Europe's largest providers of mental health services. In 2 Clinical Record Interactive Search-based studies, SemEHR achieved 93% (hepatitis C) and 99% (HIV) F-measure results in identifying true positive patients. At King's College Hospital in London, as part of the CogStack program (github.com/cogstack), SemEHR is being used to recruit patients into the UK Department of Health 100â000 Genomes Project (genomicsengland.co.uk). The validation study suggests that the tool can validate previously recruited cases and is very fast at searching phenotypes; time for recruitment criteria checking was reduced from days to minutes. Validated on open intensive care EHR data, Medical Information Mart for Intensive Care III, the vital signs extracted by SemEHR can achieve around 97% accuracy. CONCLUSION: Results from the multiple case studies demonstrate SemEHR's efficiency: weeks or months of work can be done within hours or minutes in some cases. SemEHR provides a more comprehensive view of patients, bringing in more and unexpected insight compared to study-oriented bespoke IE systems. SemEHR is open source, available at https://github.com/CogStack/SemEHR
Spatio-temporal Reasoning for Vague Regions
Abstract. This paper extends a mereotopological theory of spatiotemporal reasoning to vague âegg-yolk â regions. In this extension, the egg and its yolk are allowed to move and change over time. We present a classification of motion classes for vague regions as well as composition tables for reasoning about moving vague regions. We also discuss the formation of scrambled eggs when it becomes impossible to distinguish the yolk from the white and examine how to incorporate temporally and spatially dispersed observations to recover the yolk and white from a scrambled egg. Egg splitting may occur as a result of the recovery process when available information supports multiple egg recovery alternatives. Egg splitting adds another dimension of uncertainty to reasoning with vague regions.