25 research outputs found
MedCATTrainer: A Biomedical Free Text Annotation Interface with Active Learning and Research Use Case Specific Customisation
We present MedCATTrainer an interface for building, improving and customising
a given Named Entity Recognition and Linking (NER+L) model for biomedical
domain text. NER+L is often used as a first step in deriving value from
clinical text. Collecting labelled data for training models is difficult due to
the need for specialist domain knowledge. MedCATTrainer offers an interactive
web-interface to inspect and improve recognised entities from an underlying
NER+L model via active learning. Secondary use of data for clinical research
often has task and context specific criteria. MedCATTrainer provides a further
interface to define and collect supervised learning training data for
researcher specific use cases. Initial results suggest our approach allows for
efficient and accurate collection of research use case specific training data
Hospital-wide natural language processing summarising the health data of 1 million patients
Electronic health records (EHRs) represent a major repository of real world clinical trajectories, interventions and outcomes. While modern enterprise EHR's try to capture data in structured standardised formats, a significant bulk of the available information captured in the EHR is still recorded only in unstructured text format and can only be transformed into structured codes by manual processes. Recently, Natural Language Processing (NLP) algorithms have reached a level of performance suitable for large scale and accurate information extraction from clinical text. Here we describe the application of open-source named-entity-recognition and linkage (NER+L) methods (CogStack, MedCAT) to the entire text content of a large UK hospital trust (King's College Hospital, London). The resulting dataset contains 157M SNOMED concepts generated from 9.5M documents for 1.07M patients over a period of 9 years. We present a summary of prevalence and disease onset as well as a patient embedding that captures major comorbidity patterns at scale. NLP has the potential to transform the health data lifecycle, through large-scale automation of a traditionally manual task
Challenges and Opportunities of Using Transformer-Based Multi-Task Learning in NLP Through ML Lifecycle: A Survey
The increasing adoption of natural language processing (NLP) models across
industries has led to practitioners' need for machine learning systems to
handle these models efficiently, from training to serving them in production.
However, training, deploying, and updating multiple models can be complex,
costly, and time-consuming, mainly when using transformer-based pre-trained
language models. Multi-Task Learning (MTL) has emerged as a promising approach
to improve efficiency and performance through joint training, rather than
training separate models. Motivated by this, we first provide an overview of
transformer-based MTL approaches in NLP. Then, we discuss the challenges and
opportunities of using MTL approaches throughout typical ML lifecycle phases,
specifically focusing on the challenges related to data engineering, model
development, deployment, and monitoring phases. This survey focuses on
transformer-based MTL architectures and, to the best of our knowledge, is novel
in that it systematically analyses how transformer-based MTL in NLP fits into
ML lifecycle phases. Furthermore, we motivate research on the connection
between MTL and continual learning (CL), as this area remains unexplored. We
believe it would be practical to have a model that can handle both MTL and CL,
as this would make it easier to periodically re-train the model, update it due
to distribution shifts, and add new capabilities to meet real-world
requirements
A Knowledge Distillation Ensemble Framework for Predicting Short and Long-term Hospitalisation Outcomes from Electronic Health Records Data
The ability to perform accurate prognosis of patients is crucial for
proactive clinical decision making, informed resource management and
personalised care. Existing outcome prediction models suffer from a low recall
of infrequent positive outcomes. We present a highly-scalable and robust
machine learning framework to automatically predict adversity represented by
mortality and ICU admission from time-series vital signs and laboratory results
obtained within the first 24 hours of hospital admission. The stacked platform
comprises two components: a) an unsupervised LSTM Autoencoder that learns an
optimal representation of the time-series, using it to differentiate the less
frequent patterns which conclude with an adverse event from the majority
patterns that do not, and b) a gradient boosting model, which relies on the
constructed representation to refine prediction, incorporating static features
of demographics, admission details and clinical summaries. The model is used to
assess a patient's risk of adversity over time and provides visual
justifications of its prediction based on the patient's static features and
dynamic signals. Results of three case studies for predicting mortality and ICU
admission show that the model outperforms all existing outcome prediction
models, achieving PR-AUC of 0.891 (95 CI: 0.878 - 0.969) in predicting
mortality in ICU and general ward settings and 0.908 (95 CI: 0.870-0.935) in
predicting ICU admission.Comment: 14 page
Identifying physical health comorbidities in a cohort of individuals with severe mental illness:An application of SemEHR
Multimorbidity research in mental health services requires data from physical
health conditions which is traditionally limited in mental health care
electronic health records. In this study, we aimed to extract data from
physical health conditions from clinical notes using SemEHR. Data was extracted
from Clinical Record Interactive Search (CRIS) system at South London and
Maudsley Biomedical Research Centre (SLaM BRC) and the cohort consisted of all
individuals who had received a primary or secondary diagnosis of severe mental
illness between 2007 and 2018. Three pairs of annotators annotated 2403
documents with an average Cohen's Kappa of 0.757. Results show that the NLP
performance varies across different diseases areas (F1 0.601 - 0.954)
suggesting that the language patterns or terminologies of different condition
groups entail different technical challenges to the same NLP task.Comment: 4 pages, 2 table
Mapping multimorbidity in individuals with schizophrenia and bipolar disorders: evidence from the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLAM BRC) case register
OBJECTIVES: The first aim of this study was to design and develop a valid and replicable strategy to extract physical health conditions from clinical notes which are common in mental health services. Then, we examined the prevalence of these conditions in individuals with severe mental illness (SMI) and compared their individual and combined prevalence in individuals with bipolar (BD) and schizophrenia spectrum disorders (SSD). DESIGN: Observational study. SETTING: Secondary mental healthcare services from South London PARTICIPANTS: Our maximal sample comprised 17 500 individuals aged 15 years or older who had received a primary or secondary SMI diagnosis (International Classification of Diseases, 10th edition, F20-31) between 2007 and 2018. MEASURES: We designed and implemented a data extraction strategy for 21 common physical comorbidities using a natural language processing pipeline, MedCAT. Associations were investigated with sex, age at SMI diagnosis, ethnicity and social deprivation for the whole cohort and the BD and SSD subgroups. Linear regression models were used to examine associations with disability measured by the Health of Nations Outcome Scale. RESULTS: Physical health data were extracted, achieving precision rates (F1) above 0.90 for all conditions. The 10 most prevalent conditions were diabetes, hypertension, asthma, arthritis, epilepsy, cerebrovascular accident, eczema, migraine, ischaemic heart disease and chronic obstructive pulmonary disease. The most prevalent combination in this population included diabetes, hypertension and asthma, regardless of their SMI diagnoses. CONCLUSIONS: Our data extraction strategy was found to be adequate to extract physical health data from clinical notes, which is essential for future multimorbidity research using text records. We found that around 40% of our cohort had multimorbidity from which 20% had complex multimorbidity (two or more physical conditions besides SMI). Sex, age, ethnicity and social deprivation were found to be key to understand their heterogeneity and their differential contribution to disability levels in this population. These outputs have direct implications for researchers and clinicians
AI chatbots not yet ready for clinical use
As large language models (LLMs) expand and become more advanced, so do the natural language processing capabilities of conversational AI, or âchatbotsâ. OpenAI's recent release, ChatGPT, uses a transformer-based model to enable human-like text generation and question-answering on general domain knowledge, while a healthcare-specific Large Language Model (LLM) such as GatorTron has focused on the real-world healthcare domain knowledge. As LLMs advance to achieve near human-level performances on medical question and answering benchmarks, it is probable that Conversational AI will soon be developed for use in healthcare. In this article we discuss the potential and compare the performance of two different approaches to generative pretrained transformersâChatGPT, the most widely used general conversational LLM, and Foresight, a GPT (generative pretrained transformer) based model focused on modelling patients and disorders. The comparison is conducted on the task of forecasting relevant diagnoses based on clinical vignettes. We also discuss important considerations and limitations of transformer-based chatbots for clinical use
Investigating the association between physical health comorbidities and disability in individuals with severe mental illness
Background:
Research suggests that an increased risk of physical comorbidities might have a key role in the association between severe mental illness (SMI) and disability. We examined the association between physical multimorbidity and disability in individuals with SMI.
Methods:
Data were extracted from the clinical record interactive search system at South London and Maudsley Biomedical Research Centre. Our sample (n = 13,933) consisted of individuals who had received a primary or secondary SMI diagnosis between 2007 and 2018 and had available data for Health of Nations Outcome Scale (HoNOS) as disability measure. Physical comorbidities were defined using Chapters IIâXIV of the International Classification of Diagnoses (ICD-10).
Results:
More than 60 % of the sample had complex multimorbidity. The most common organ system affected were neurological (34.7%), dermatological (15.4%), and circulatory (14.8%). All specific comorbidities (ICD-10 Chapters) were associated with higher levels of disability, HoNOS total scores. Individuals with musculoskeletal, skin/dermatological, respiratory, endocrine, neurological, hematological, or circulatory disorders were found to be associated with significant difficulties associated with more than five HoNOS domains while others had a lower number of domains affected.
Conclusions:
Individuals with SMI and musculoskeletal, skin/dermatological, respiratory, endocrine, neurological, hematological, or circulatory disorders are at higher risk of disability compared to those who do not have those comorbidities. Individuals with SMI and physical comorbidities are at greater risk of reporting difficulties associated with activities of daily living, hallucinations, and cognitive functioning. Therefore, these should be targeted for prevention and intervention programs