Search CORE

25 research outputs found

MedCATTrainer: A Biomedical Free Text Annotation Interface with Active Learning and Research Use Case Specific Customisation

Author: Bean Daniel
Bendayan Rebecca
Dobson Richard
Kraljevic Zeljko
Searle Thomas
Publication venue
Publication date: 16/07/2019
Field of study

We present MedCATTrainer an interface for building, improving and customising a given Named Entity Recognition and Linking (NER+L) model for biomedical domain text. NER+L is often used as a first step in deriving value from clinical text. Collecting labelled data for training models is difficult due to the need for specialist domain knowledge. MedCATTrainer offers an interactive web-interface to inspect and improve recognised entities from an underlying NER+L model via active learning. Secondary use of data for clinical research often has task and context specific criteria. MedCATTrainer provides a further interface to define and collect supervised learning training data for researcher specific use cases. Initial results suggest our approach allows for efficient and accurate collection of research use case specific training data

arXiv.org e-Print Archive

Hospital-wide natural language processing summarising the health data of 1 million patients

Author: Bean Daniel M
Dobson Richard JB
Kraljevic Zeljko
Shek Anthony
Teo James
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 09/05/2023
Field of study

Electronic health records (EHRs) represent a major repository of real world clinical trajectories, interventions and outcomes. While modern enterprise EHR's try to capture data in structured standardised formats, a significant bulk of the available information captured in the EHR is still recorded only in unstructured text format and can only be transformed into structured codes by manual processes. Recently, Natural Language Processing (NLP) algorithms have reached a level of performance suitable for large scale and accurate information extraction from clinical text. Here we describe the application of open-source named-entity-recognition and linkage (NER+L) methods (CogStack, MedCAT) to the entire text content of a large UK hospital trust (King's College Hospital, London). The resulting dataset contains 157M SNOMED concepts generated from 9.5M documents for 1.07M patients over a period of 9 years. We present a summary of prevalence and disease onset as well as a patient embedding that captures major comorbidity patterns at scale. NLP has the potential to transform the health data lifecycle, through large-scale automation of a traditionally manual task

UCL Discovery

Challenges and Opportunities of Using Transformer-Based Multi-Task Learning in NLP Through ML Lifecycle: A Survey

Author: Ferkovic Tin
Kraljevic Zeljko
Mihelcic Velimir
Roguski Lukasz
Sarlija Bruno
Torbarina Lovre
Publication venue
Publication date: 16/08/2023
Field of study

The increasing adoption of natural language processing (NLP) models across industries has led to practitioners' need for machine learning systems to handle these models efficiently, from training to serving them in production. However, training, deploying, and updating multiple models can be complex, costly, and time-consuming, mainly when using transformer-based pre-trained language models. Multi-Task Learning (MTL) has emerged as a promising approach to improve efficiency and performance through joint training, rather than training separate models. Motivated by this, we first provide an overview of transformer-based MTL approaches in NLP. Then, we discuss the challenges and opportunities of using MTL approaches throughout typical ML lifecycle phases, specifically focusing on the challenges related to data engineering, model development, deployment, and monitoring phases. This survey focuses on transformer-based MTL architectures and, to the best of our knowledge, is novel in that it systematically analyses how transformer-based MTL in NLP fits into ML lifecycle phases. Furthermore, we motivate research on the connection between MTL and continual learning (CL), as this area remains unexplored. We believe it would be practical to have a model that can handle both MTL and CL, as this would make it easier to periodically re-train the model, update it due to distribution shifts, and add new capabilities to meet real-world requirements

arXiv.org e-Print Archive

A Knowledge Distillation Ensemble Framework for Predicting Short and Long-term Hospitalisation Outcomes from Electronic Health Records Data

Author: Bean Daniel
Dobson Richard J.
Galloway James
Ibrahim Zina
Kraljevic Zeljko
Norton Sam
Qian Linglong
Searle Thomas
Shek Anthony
Teo James
Wu Honghan
Publication venue
Publication date: 01/01/2021
Field of study

The ability to perform accurate prognosis of patients is crucial for proactive clinical decision making, informed resource management and personalised care. Existing outcome prediction models suffer from a low recall of infrequent positive outcomes. We present a highly-scalable and robust machine learning framework to automatically predict adversity represented by mortality and ICU admission from time-series vital signs and laboratory results obtained within the first 24 hours of hospital admission. The stacked platform comprises two components: a) an unsupervised LSTM Autoencoder that learns an optimal representation of the time-series, using it to differentiate the less frequent patterns which conclude with an adverse event from the majority patterns that do not, and b) a gradient boosting model, which relies on the constructed representation to refine prediction, incorporating static features of demographics, admission details and clinical summaries. The model is used to assess a patient's risk of adversity over time and provides visual justifications of its prediction based on the patient's static features and dynamic signals. Results of three case studies for predicting mortality and ICU admission show that the model outperforms all existing outcome prediction models, achieving PR-AUC of 0.891 (95

%

CI: 0.878 - 0.969) in predicting mortality in ICU and general ward settings and 0.908 (95

%

CI: 0.870-0.935) in predicting ICU admission.Comment: 14 page

arXiv.org e-Print Archive

UCL Discovery

Enlighten

King's Research Portal

Identifying physical health comorbidities in a cohort of individuals with severe mental illness:An application of SemEHR

Author: Bean Daniel
Bendayan Rebecca
Chaturvedi Jaya
Das-Munshi Jayati
Dobson Richard
Ibrahim Zina
Kraljevic Zeljko
Mascio Aurelie
Roberts Angus
Searle Tom
Stewart Robert
Wu Honghan
Publication venue
Publication date: 01/01/2020
Field of study

Multimorbidity research in mental health services requires data from physical health conditions which is traditionally limited in mental health care electronic health records. In this study, we aimed to extract data from physical health conditions from clinical notes using SemEHR. Data was extracted from Clinical Record Interactive Search (CRIS) system at South London and Maudsley Biomedical Research Centre (SLaM BRC) and the cohort consisted of all individuals who had received a primary or secondary diagnosis of severe mental illness between 2007 and 2018. Three pairs of annotators annotated 2403 documents with an average Cohen's Kappa of 0.757. Results show that the NLP performance varies across different diseases areas (F1 0.601 - 0.954) suggesting that the language patterns or terminologies of different condition groups entail different technical challenges to the same NLP task.Comment: 4 pages, 2 table

arXiv.org e-Print Archive

Edinburgh Research Explorer

Mapping multimorbidity in individuals with schizophrenia and bipolar disorders: evidence from the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLAM BRC) case register

Author: Aldelemi Sarah
Bean Daniel
Bendayan Rebecca
Chance Natalia
Chaturvedi Jaya
Das-Munshi Jayati
Dobson Richard
Kraljevic Zeljko
Leipold Leona
Mascio Aurelie
Mirza Luwaiza
Roberts Angus
Searle Thomas
Shaari Shaweena
Skiada Naoko
Stewart Robert
Wang Tao
Publication venue: BMJ PUBLISHING GROUP
Publication date: 01/01/2022
Field of study

OBJECTIVES: The first aim of this study was to design and develop a valid and replicable strategy to extract physical health conditions from clinical notes which are common in mental health services. Then, we examined the prevalence of these conditions in individuals with severe mental illness (SMI) and compared their individual and combined prevalence in individuals with bipolar (BD) and schizophrenia spectrum disorders (SSD). DESIGN: Observational study. SETTING: Secondary mental healthcare services from South London PARTICIPANTS: Our maximal sample comprised 17 500 individuals aged 15 years or older who had received a primary or secondary SMI diagnosis (International Classification of Diseases, 10th edition, F20-31) between 2007 and 2018. MEASURES: We designed and implemented a data extraction strategy for 21 common physical comorbidities using a natural language processing pipeline, MedCAT. Associations were investigated with sex, age at SMI diagnosis, ethnicity and social deprivation for the whole cohort and the BD and SSD subgroups. Linear regression models were used to examine associations with disability measured by the Health of Nations Outcome Scale. RESULTS: Physical health data were extracted, achieving precision rates (F1) above 0.90 for all conditions. The 10 most prevalent conditions were diabetes, hypertension, asthma, arthritis, epilepsy, cerebrovascular accident, eczema, migraine, ischaemic heart disease and chronic obstructive pulmonary disease. The most prevalent combination in this population included diabetes, hypertension and asthma, regardless of their SMI diagnoses. CONCLUSIONS: Our data extraction strategy was found to be adequate to extract physical health data from clinical notes, which is essential for future multimorbidity research using text records. We found that around 40% of our cohort had multimorbidity from which 20% had complex multimorbidity (two or more physical conditions besides SMI). Sex, age, ethnicity and social deprivation were found to be key to understand their heterogeneity and their differential contribution to disability levels in this population. These outputs have direct implications for researchers and clinicians

UCL Discovery

PubMed Central

AI chatbots not yet ready for clinical use

Author: Akish Luintel
Alfred Balston
Esther Idowu
James T. Teo
James T. Teo
Joshua Au Yeung
Joshua Au Yeung
Richard J. Dobson
Richard J. Dobson
Zeljko Kraljevic
Publication venue: 'Frontiers Media SA'
Publication date: 01/04/2023
Field of study

As large language models (LLMs) expand and become more advanced, so do the natural language processing capabilities of conversational AI, or “chatbots”. OpenAI's recent release, ChatGPT, uses a transformer-based model to enable human-like text generation and question-answering on general domain knowledge, while a healthcare-specific Large Language Model (LLM) such as GatorTron has focused on the real-world healthcare domain knowledge. As LLMs advance to achieve near human-level performances on medical question and answering benchmarks, it is probable that Conversational AI will soon be developed for use in healthcare. In this article we discuss the potential and compare the performance of two different approaches to generative pretrained transformers—ChatGPT, the most widely used general conversational LLM, and Foresight, a GPT (generative pretrained transformer) based model focused on modelling patients and disorders. The comparison is conducted on the task of forecasting relevant diagnoses based on clinical vignettes. We also discuss important considerations and limitations of transformer-based chatbots for clinical use

Directory of Open Access Journals

Investigating the association between physical health comorbidities and disability in individuals with severe mental illness

Author: Bean Daniel
Bendayan Rebecca
Chaturvedi Jaya
Das-Munshi Jayati
Dobson Richard
Kraljevic Zeljko
Mascio Aurelie
Mirza Luwaiza
Roberts Angus
Searle Thomas
Shaari Shaweena
Skiada Naoko
Stewart Robert
Wu Honghan
Publication venue: Cambridge University Press
Publication date: 01/01/2021
Field of study

Background: Research suggests that an increased risk of physical comorbidities might have a key role in the association between severe mental illness (SMI) and disability. We examined the association between physical multimorbidity and disability in individuals with SMI. Methods: Data were extracted from the clinical record interactive search system at South London and Maudsley Biomedical Research Centre. Our sample (n = 13,933) consisted of individuals who had received a primary or secondary SMI diagnosis between 2007 and 2018 and had available data for Health of Nations Outcome Scale (HoNOS) as disability measure. Physical comorbidities were defined using Chapters II–XIV of the International Classification of Diagnoses (ICD-10). Results: More than 60 % of the sample had complex multimorbidity. The most common organ system affected were neurological (34.7%), dermatological (15.4%), and circulatory (14.8%). All specific comorbidities (ICD-10 Chapters) were associated with higher levels of disability, HoNOS total scores. Individuals with musculoskeletal, skin/dermatological, respiratory, endocrine, neurological, hematological, or circulatory disorders were found to be associated with significant difficulties associated with more than five HoNOS domains while others had a lower number of domains affected. Conclusions: Individuals with SMI and musculoskeletal, skin/dermatological, respiratory, endocrine, neurological, hematological, or circulatory disorders are at higher risk of disability compared to those who do not have those comorbidities. Individuals with SMI and physical comorbidities are at greater risk of reporting difficulties associated with activities of daily living, hallucinations, and cognitive functioning. Therefore, these should be targeted for prevention and intervention programs

Enlighten