Search CORE

14 research outputs found

CoPHE: A Count-Preserving Hierarchical Evaluation Metric in Large-Scale Multi-Label Text Classification

Author: Alex Beatrice
Birch Alexandra
Dong Hang
Falis Matúš
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 07/11/2021
Field of study

Automated Clinical Coding:What, Why, and Where We Are?

Author: Alex Beatrice
Chen Jiaoyan
Dong Hang
Falis Matúš
Ji Shaoxiong
Matterson Joshua
Whiteley William
Wu Honghan
Publication venue
Publication date: 01/01/2022
Field of study

Clinical coding is the task of transforming medical information in a patient's health records into structured codes so that they can be used for statistical analysis. This is a cognitive and time-consuming task that follows a standard process in order to achieve a high level of consistency. Clinical coding could potentially be supported by an automated system to improve the efficiency and accuracy of the process. We introduce the idea of automated clinical coding and summarise its challenges from the perspective of Artificial Intelligence (AI) and Natural Language Processing (NLP), based on the literature, our project experience over the past two and half years (late 2019 - early 2022), and discussions with clinical coding experts in Scotland and the UK. Our research reveals the gaps between the current deep learning-based approach applied to clinical coding and the need for explainability and consistency in real-world practice. Knowledge-based methods that represent and reason the standard, explainable process of a task may need to be incorporated into deep learning-based methods for clinical coding. Automated clinical coding is a promising task for AI, despite the technical and organisational challenges. Coders are needed to be involved in the development process. There is much to achieve to develop and deploy an AI-based automated system to support coding in the next five years and beyond.Comment: accepted for npj Digital Medicin

arXiv.org e-Print Archive

UCL Discovery

PubMed Central

Edinburgh Research Explorer

Aaltodoc Publication Archive

Oxford University Research Archive

Enlighten

Automated clinical coding:What, why, and where we are?

Author: Alex Beatrice
Chen Jiaoyan
Dong Hang
Falis Matúš
Ji Shaoxiong
Matterson Joshua
Whiteley William
Wu Honghan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/10/2022
Field of study

Funding Information: The work is supported by WellCome Trust iTPA Awards (PIII009, PIII032), Health Data Research UK National Phenomics and Text Analytics Implementation Projects, and the United Kingdom Research and Innovation (grant EP/S02431X/1), UKRI Centre for Doctoral Training in Biomedical AI at the University of Edinburgh, School of Informatics. H.D. and J.C. are supported by the Engineering and Physical Sciences Research Council (EP/V050869/1) on “ConCur: Knowledge Base Construction and Curation”. HW was supported by Medical Research Council and Health Data Research UK (MR/S004149/1, MR/S004149/2); British Council (UCL-NMU-SEU international collaboration on Artificial Intelligence in Medicine: tackling challenges of low generalisability and health inequality); National Institute for Health Research (NIHR202639); Advanced Care Research Centre at the University of Edinburgh. We thank constructive comments from Murray Bell and Janice Watson in Terminology Service in Public Health Scotland, and information provided by Allison Reid in the coding department in NHS Lothian, Paul Mitchell, Nicola Symmers, and Barry Hewit in Edinburgh Cancer Informatics, and staff in Epic Systems Corporation. Thanks for the suggestions from Dr. Emma Davidson regarding clinical research. Thanks to the discussions with Dr. Kristiina Rannikmäe regarding the research on clinical coding and with Ruohua Han regarding the social and qualitative aspects of this research. In Fig. , the icon of “Clinical Coders” was from Freepik in Flaticon, https://www.flaticon.com/free-icon/user_747376 ; the icon of “Automated Coding System” was from Free Icon Library, https://icon-library.com/png/272370.html . Funding Information: The work is supported by WellCome Trust iTPA Awards (PIII009, PIII032), Health Data Research UK National Phenomics and Text Analytics Implementation Projects, and the United Kingdom Research and Innovation (grant EP/S02431X/1), UKRI Centre for Doctoral Training in Biomedical AI at the University of Edinburgh, School of Informatics. H.D. and J.C. are supported by the Engineering and Physical Sciences Research Council (EP/V050869/1) on “ConCur: Knowledge Base Construction and Curation”. HW was supported by Medical Research Council and Health Data Research UK (MR/S004149/1, MR/S004149/2); British Council (UCL-NMU-SEU international collaboration on Artificial Intelligence in Medicine: tackling challenges of low generalisability and health inequality); National Institute for Health Research (NIHR202639); Advanced Care Research Centre at the University of Edinburgh. We thank constructive comments from Murray Bell and Janice Watson in Terminology Service in Public Health Scotland, and information provided by Allison Reid in the coding department in NHS Lothian, Paul Mitchell, Nicola Symmers, and Barry Hewit in Edinburgh Cancer Informatics, and staff in Epic Systems Corporation. Thanks for the suggestions from Dr. Emma Davidson regarding clinical research. Thanks to the discussions with Dr. Kristiina Rannikmäe regarding the research on clinical coding and with Ruohua Han regarding the social and qualitative aspects of this research. In Fig. 1 , the icon of “Clinical Coders” was from Freepik in Flaticon, https://www.flaticon.com/free-icon/user_747376 ; the icon of “Automated Coding System” was from Free Icon Library, https://icon-library.com/png/272370.html. Publisher Copyright: © 2022, The Author(s).Clinical coding is the task of transforming medical information in a patient’s health records into structured codes so that they can be used for statistical analysis. This is a cognitive and time-consuming task that follows a standard process in order to achieve a high level of consistency. Clinical coding could potentially be supported by an automated system to improve the efficiency and accuracy of the process. We introduce the idea of automated clinical coding and summarise its challenges from the perspective of Artificial Intelligence (AI) and Natural Language Processing (NLP), based on the literature, our project experience over the past two and half years (late 2019–early 2022), and discussions with clinical coding experts in Scotland and the UK. Our research reveals the gaps between the current deep learning-based approach applied to clinical coding and the need for explainability and consistency in real-world practice. Knowledge-based methods that represent and reason the standard, explainable processof a task may need to be incorporated into deep learning-based methods for clinical coding. Automated clinical coding is a promising task for AI, despite the technical and organisational challenges. Coders are needed to be involved in the development process. There is much to achieve to develop and deploy an AI-based automated system to support coding in the next five years and beyond.Peer reviewe

PubMed Central

Edinburgh Research Explorer

Aaltodoc Publication Archive

Language transfer for early warning of epidemics from social media

Author: Appelgren Mattias
Falis Matúš
Ikeda Satoshi
O'Neil Alison Q
Schrempf Patrick
Publication venue
Publication date: 10/10/2019
Field of study

Statements on social media can be analysed to identify individuals who are experiencing red flag medical symptoms, allowing early detection of the spread of disease such as influenza. Since disease does not respect cultural borders and may spread between populations speaking different languages, we would like to build multilingual models. However, the data required to train models for every language may be difficult, expensive and time-consuming to obtain, particularly for low-resource languages. Taking Japanese as our target language, we explore methods by which data in one language might be used to build models for a different language. We evaluate strategies of training on machine translated data and of zero-shot transfer through the use of multilingual models. We find that the choice of source language impacts the performance, with Chinese-Japanese being a better language pair than English-Japanese. Training on machine translated data shows promise, especially when used in conjunction with a small amount of target language data.PostprintPeer reviewe

arXiv.org e-Print Archive

University of St. Andrews - Pure

St Andrews Research Repository

Can GPT-3.5 Generate and Code Discharge Summaries?

Author: Alex Beatrice
Basetti Siddharth
Birch Alexandra
Daines Luke
Dong Hang
Falis Matúš
Gema Aryo Pradipta
Holder Michael
Penfold Rose S
Publication venue
Publication date: 24/01/2024
Field of study

Objective: To investigate GPT-3.5 in generating and coding medical documents with ICD-10 codes for data augmentation on low-resources labels. Materials and Methods: Employing GPT-3.5 we generated and coded 9,606 discharge summaries based on lists of ICD-10 code descriptions of patients with infrequent (generation) codes within the MIMIC-IV dataset. Combined with the baseline training set, this formed an augmented training set. Neural coding models were trained on baseline and augmented data and evaluated on a MIMIC-IV test set. We report micro- and macro-F1 scores on the full codeset, generation codes, and their families. Weak Hierarchical Confusion Matrices were employed to determine within-family and outside-of-family coding errors in the latter codesets. The coding performance of GPT-3.5 was evaluated both on prompt-guided self-generated data and real MIMIC-IV data. Clinical professionals evaluated the clinical acceptability of the generated documents. Results: Augmentation slightly hinders the overall performance of the models but improves performance for the generation candidate codes and their families, including one unseen in the baseline training data. Augmented models display lower out-of-family error rates. GPT-3.5 can identify ICD-10 codes by the prompted descriptions, but performs poorly on real data. Evaluators note the correctness of generated concepts while suffering in variety, supporting information, and narrative. Discussion and Conclusion: GPT-3.5 alone is unsuitable for ICD-10 coding. Augmentation positively affects generation code families but mainly benefits codes with existing examples. Augmentation reduces out-of-family errors. Discharge summaries generated by GPT-3.5 state prompted concepts correctly but lack variety, and authenticity in narratives. They are unsuitable for clinical practice.Comment: 15 pages; 250 words in abstract; 3,929 words in main body; 2 figures (0 black and white, 2 colour); 4 tables; 34 reference

arXiv.org e-Print Archive

Paying per-label attention for multi-label extraction from radiology reports

Author: Falis Matúš
Harris-Birtill David
Lisowska Aneta
Mikhael Shadia
Muir Keith W.
O'Neil Alison Q.
Pajak Maciej
Schrempf Patrick
Watson Hannah
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Funding: This work is part of the Industrial Centre for AI Research in digital Diagnostics (iCAIRD) which is funded by Innovate UK on behalf of UK Research and Innovation (UKRI) [project number: 104690].Training medical image analysis models requires large amounts of expertly annotated data which is time-consuming and expensive to obtain. Images are often accompanied by free-text radiology reports which are a rich source of information. In this paper, we tackle the automated extraction of structured labels from head CT reports for imaging of suspected stroke patients, using deep learning. Firstly, we propose a set of 31 labels which correspond to radiographic findings (e.g. hyperdensity) and clinical impressions (e.g. haemorrhage) related to neurological abnormalities. Secondly, inspired by previous work, we extend existing state-of-the-art neural network models with a label-dependent attention mechanism. Using this mechanism and simple synthetic data augmentation, we are able to robustly extract many labels with a single model, classified according to the radiologist's reporting (positive, uncertain, negative). This approach can be used in further research to effectively extract many labels from medical text.PostprintPostprin

arXiv.org e-Print Archive

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

Automated clinical coding: what, why, and where we are?

Author: Alex Beatrice
Chen Jiaoyan
Dong Hang
Falis Matúš
Ji Shaoxiong
Matteson Joshua
Whiteley William
Wu Honghan
Publication venue: Nature Research
Publication date: 22/10/2022
Field of study

Clinical coding is the task of transforming medical information in a patient’s health records into structured codes so that they can be used for statistical analysis. This is a cognitive and time-consuming task that follows a standard process in order to achieve a high level of consistency. Clinical coding could potentially be supported by an automated system to improve the efficiency and accuracy of the process. We introduce the idea of automated clinical coding and summarise its challenges from the perspective of Artificial Intelligence (AI) and Natural Language Processing (NLP), based on the literature, our project experience over the past two and half years (late 2019–early 2022), and discussions with clinical coding experts in Scotland and the UK. Our research reveals the gaps between the current deep learning-based approach applied to clinical coding and the need for explainability and consistency in real-world practice. Knowledge-based methods that represent and reason the standard, explainable process of a task may need to be incorporated into deep learning-based methods for clinical coding. Automated clinical coding is a promising task for AI, despite the technical and organisational challenges. Coders are needed to be involved in the development process. There is much to achieve to develop and deploy an AI-based automated system to support coding in the next five years and beyond

Enlighten

Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for ICD-9 Coding

Author: Alex Beatrice
Birch Alexandra
Dong Hang
Falis Matúš
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2022
Field of study

Medical document coding is the process of assigning labels from a structured label space (ontology – e.g., ICD-9) to medical documents. This process is laborious, costly, and errorprone. In recent years, efforts have been made to automate this process with neural models. The label spaces are large (in the order of thousands of labels) and follow a big-head long-tail label distribution, giving rise to few-shot and zero-shot scenarios. Previous efforts tried to address these scenarios within the model, leading to improvements on rare labels, but worse results on frequent ones. We propose data augmentation and synthesis techniques in order to address these scenarios. We further introduce an analysis technique for this setting inspired by confusion matrices. This analysis technique points to the positive impact of data augmentation and synthesis, but also highlights more general issues of confusion within families of codes, and underprediction.</p

Edinburgh Research Explorer

Oxford University Research Archive

Software Feature Request Detection in Issue Tracking Systems

Author: Bürsner Simone
Falis Matúš
Hubner Paul
Merten Thorsten
Paech Barbara
Quirchmayr Thomas
Publication venue
Publication date: 03/07/2016
Field of study

Additional figures, tables, experimant data, code, and results. See README.md for more information

Crossref

ZENODO

pub H-BRS - Publikationsserver der Hochschule Bonn-Rhein-Sieg