31 research outputs found
On Classifying Sepsis Heterogeneity in the ICU: Insight Using Machine Learning
Current machine learning models aiming to predict sepsis from Electronic
Health Records (EHR) do not account for the heterogeneity of the condition,
despite its emerging importance in prognosis and treatment. This work
demonstrates the added value of stratifying the types of organ dysfunction
observed in patients who develop sepsis in the ICU in improving the ability to
recognise patients at risk of sepsis from their EHR data. Using an ICU dataset
of 13,728 records, we identify clinically significant sepsis subpopulations
with distinct organ dysfunction patterns. Classification experiments using
Random Forest, Gradient Boost Trees and Support Vector Machines, aiming to
distinguish patients who develop sepsis in the ICU from those who do not, show
that features selected using sepsis subpopulations as background knowledge
yield a superior performance regardless of the classification model used. Our
findings can steer machine learning efforts towards more personalised models
for complex conditions including sepsis.Comment: 3 Figures and 2 tables. Accepted for publication at the Journal of
American Medical Informatics Associatio
Association of physical health multimorbidity with mortality in people with schizophrenia spectrum disorders: Using a novel semantic search system that captures physical diseases in electronic patient records
OBJECTIVE
Single physical comorbidities have been associated with the premature mortality in people with schizophrenia-spectrum disorders (SSD). We investigated the association of physical multimorbidity (≥two physical health conditions) with mortality in people with SSD.
METHODS
A retrospective cohort study between 2013 and 2017. All people with a diagnosis of SSD (ICD-10: F20–F29), who had contact with secondary mental healthcare within South London during 2011–2012 were included. A novel semantic search system captured conditions from electronic mental health records, and all-cause mortality were retrieved. Hazard ratios (HRs) and population attributable fractions (PAFs) were calculated for associations between physical multimorbidity and all-cause mortality.
RESULTS
Among the 9775 people with SSD (mean (SD) age, 45.9 (15.4); males, 59.3%), 6262 (64%) had physical multimorbidity, and 880 (9%) died during the 5-year follow-up. The top three physical multimorbidity combinations with highest mortality were cardiovascular-respiratory (HR: 2.23; 95% CI, 1.49–3.32), respiratory-skin (HR: 2.06; 95% CI, 1.31–3.24), and respiratory-digestive (HR: 1.88; 95% CI, 1.14–3.11), when adjusted for age, gender, and all other physical disease systems. Combinations of physical diseases with highest PAFs were cardiovascular-respiratory (PAF: 35.7%), neurologic-respiratory (PAF: 32.7%), as well as respiratory-skin (PAF: 29.8%).
CONCLUSIONS
Approximately 2/3 of patients with SSD had physical multimorbidity and the risk of mortality in these patients was further increased compared to those with none or single physical conditions. These findings suggest that in order to reduce the physical health burden and subsequent mortality in people with SSD, proactive coordinated prevention and management efforts are required and should extend beyond the current focus on single physical comorbidities
Implementation of a real-time psychosis risk detection and alerting system based on electronic health records using cogstack
Recent studies have shown that an automated, lifespan-inclusive, transdiagnostic, and clinically based, individualized risk calculator provides a powerful system for supporting the early detection of individuals at-risk of psychosis at a large scale, by leveraging electronic health records (EHRs). This risk calculator has been externally validated twice and is undergoing feasibility testing for clinical implementation. Integration of this risk calculator in clinical routine should be facilitated by prospective feasibility studies, which are required to address pragmatic challenges, such as missing data, and the usability of this risk calculator in a real-world and routine clinical setting. Here, we present an approach for a prospective implementation of a real-time psychosis risk detection and alerting service in a real-world EHR system. This method leverages the CogStack platform, which is an open-source, lightweight, and distributed information retrieval and text extraction system. The CogStack platform incorporates a set of services that allow for full-text search of clinical data, lifespan-inclusive, real-time calculation of psychosis risk, early risk-alerting to clinicians, and the visual monitoring of patients over time. Our method includes: 1) ingestion and synchronization of data from multiple sources into the CogStack platform, 2) implementation of a risk calculator, whose algorithm was previously developed and validated, for timely computation of a patient's risk of psychosis, 3) creation of interactive visualizations and dashboards to monitor patients' health status over time, and 4) building automated alerting systems to ensure that clinicians are notified of patients at-risk, so that appropriate actions can be pursued. This is the first ever study that has developed and implemented a similar detection and alerting system in clinical routine for early detection of psychosis
ADEPt, a semantically-enriched pipeline for extracting adverse drug events from free-text electronic health records
Adverse drug events (ADEs) are unintended responses to medical treatment. They can greatly affect a patient's quality of life and present a substantial burden on healthcare. Although Electronic health records (EHRs) document a wealth of information relating to ADEs, they are frequently stored in the unstructured or semi-structured free-text narrative requiring Natural Language Processing (NLP) techniques to mine the relevant information. Here we present a rule-based ADE detection and classification pipeline built and tested on a large Psychiatric corpus comprising 264k patients using the de-identified EHRs of four UK-based psychiatric hospitals. The pipeline uses characteristics specific to Psychiatric EHRs to guide the annotation process, and distinguishes: a) the temporal value associated with the ADE mention (whether it is historical or present), b) the categorical value of the ADE (whether it is assertive, hypothetical, retrospective or a general discussion) and c) the implicit contextual value where the status of the ADE is deduced from surrounding indicators, rather than explicitly stated. We manually created the rulebase in collaboration with clinicians and pharmacists by studying ADE mentions in various types of clinical notes. We evaluated the open-source Adverse Drug Event annotation Pipeline (ADEPt) using 19 ADEs specific to antipsychotics and antidepressants medication. The ADEs chosen vary in severity, regularity and persistence. The average F-measure and accuracy achieved by our tool across all tested ADEs were 0.83 and 0.83 respectively. In addition to annotation power, the ADEPT pipeline presents an improvement to the state of the art context-discerning algorithm, ConText
The side effect profile of Clozapine in real world data of three large mental hospitals
Objective: Mining the data contained within Electronic Health Records (EHRs)
can potentially generate a greater understanding of medication effects in the
real world, complementing what we know from Randomised control trials (RCTs).
We Propose a text mining approach to detect adverse events and medication
episodes from the clinical text to enhance our understanding of adverse effects
related to Clozapine, the most effective antipsychotic drug for the management
of treatment-resistant schizophrenia, but underutilised due to concerns over
its side effects. Material and Methods: We used data from de-identified EHRs of
three mental health trusts in the UK (>50 million documents, over 500,000
patients, 2835 of which were prescribed Clozapine). We explored the prevalence
of 33 adverse effects by age, gender, ethnicity, smoking status and admission
type three months before and after the patients started Clozapine treatment. We
compared the prevalence of adverse effects with those reported in the Side
Effects Resource (SIDER) where possible. Results: Sedation, fatigue, agitation,
dizziness, hypersalivation, weight gain, tachycardia, headache, constipation
and confusion were amongst the highest recorded Clozapine adverse effect in the
three months following the start of treatment. Higher percentages of all
adverse effects were found in the first month of Clozapine therapy. Using a
significance level of (p< 0.05) out chi-square tests show a significant
association between most of the ADRs in smoking status and hospital admissions
and some in gender and age groups. Further, the data was combined from three
trusts, and chi-square tests were applied to estimate the average effect of
ADRs in each monthly interval. Conclusion: A better understanding of how the
drug works in the real world can complement clinical trials and precision
medicine
Efficient Reuse of Natural Language Processing Models for Phenotype-Mention Identification in Free-text Electronic Medical Records: A Phenotype Embedding Approach.
Background: Many efforts have been put into the use of automated approaches,
such as natural language processing (NLP), to mine or extract data from
free-text medical records to construct comprehensive patient profiles for
delivering better health-care. Reusing NLP models in new settings, however,
remains cumbersome - requiring validation and/or retraining on new data
iteratively to achieve convergent results.
Objective: The aim of this work is to minimize the effort involved in reusing
NLP models on free-text medical records.
Methods: We formally define and analyse the model adaptation problem in
phenotype-mention identification tasks. We identify "duplicate waste" and
"imbalance waste", which collectively impede efficient model reuse. We propose
a phenotype embedding based approach to minimize these sources of waste without
the need for labelled data from new settings.
Results: We conduct experiments on data from a large mental health registry
to reuse NLP models in four phenotype-mention identification tasks. The
proposed approach can choose the best model for a new task, identifying up to
76% (duplicate waste), i.e. phenotype mentions without the need for validation
and model retraining, and with very good performance (93-97% accuracy). It can
also provide guidance for validating and retraining the selected model for
novel language patterns in new tasks, saving around 80% (imbalance waste), i.e.
the effort required in "blind" model-adaptation approaches.
Conclusions: Adapting pre-trained NLP models for new tasks can be more
efficient and effective if the language pattern landscapes of old settings and
new settings can be made explicit and comparable. Our experiments show that the
phenotype-mention embedding approach is an effective way to model language
patterns for phenotype-mention identification tasks and that its use can guide
efficient NLP model reuse
Automated PDF highlighting to support faster curation of literature for Parkinson's and Alzheimer's disease
Neurodegenerative disorders such as Parkinson’s and Alzheimer’s disease are devastating and costly illnesses, a source of major global burden. In order to provide successful interventions for patients and reduce costs, both causes and pathological processes need to be understood. The ApiNATOMY project aims to contribute to our understanding of neurodegenerative disorders by manually curating and abstracting data from the vast body of literature amassed on these illnesses. As curation is labour-intensive, we aimed to speed up the process by automatically highlighting those parts of the PDF document of primary importance to the curator. Using techniques similar to those of summarisation, we developed an algorithm that relies on linguistic, semantic and spatial features. Employing this algorithm on a test set manually corrected for tool imprecision, we achieved a macro F1-measure of 0.51, which is an increase of 132% compared to the best bag-of-words baseline model. A user based evaluation was also conducted to assess the usefulness of the methodology on 40 unseen publications, which reveals that in 85% of cases all highlighted sentences are relevant to the curation task and in about 65% of the cases, the highlights are sufficient to support the knowledge curation task without needing to consult the full text. In conclusion, we believe that these are promising results for a step in automating the recognition of curation-relevant sentences. Refining our approach to pre-digest papers will lead to faster processing and cost reduction in the curation process
Association of blood lipids with Alzheimer's disease: A comprehensive lipidomics analysis
Introduction: The aim of this study was to (1) replicate previous associations between six blood lipids and Alzheimer’s disease (AD) (Proitsi et al 2015) and (2) identify novel associations between lipids, clinical AD diagnosis, disease progression and brain atrophy (left/right hippocampus/entorhinal cortex). Methods: We performed untargeted lipidomic analysis on 148 AD and 152 elderly control plasma samples and used univariate and multivariate analysis methods. Results: We replicated our previous lipids associations and reported novel associations between lipids molecules and all phenotypes. A combination of 24 molecules classified AD patients with .70% accuracy in a test and a validation data set, and we identified lipid signatures that predicted disease progression (R2 5 0.10, test data set) and brain atrophy (R2 0.14, all test data sets except left entorhinal cortex). We putatively identified a number of metabolic features including cholesteryl esters/triglycerides and phosphatidylcholines. Discussion: Blood lipids are promising AD biomarkers that may lead to new treatment strategies