70 research outputs found
Automated clinical coding using off-the-shelf large language models
The task of assigning diagnostic ICD codes to patient hospital admissions is
typically performed by expert human coders. Efforts towards automated ICD
coding are dominated by supervised deep learning models. However, difficulties
in learning to predict the large number of rare codes remain a barrier to
adoption in clinical practice. In this work, we leverage off-the-shelf
pre-trained generative large language models (LLMs) to develop a practical
solution that is suitable for zero-shot and few-shot code assignment, with no
need for further task-specific training. Unsupervised pre-training alone does
not guarantee precise knowledge of the ICD ontology and specialist clinical
coding task, therefore we frame the task as information extraction, providing a
description of each coded concept and asking the model to retrieve related
mentions. For efficiency, rather than iterating over all codes, we leverage the
hierarchical nature of the ICD ontology to sparsely search for relevant codes.Comment: Accepted to the NeurIPS 2023 workshop Deep Generative Models For
Health (DGM4H). 9 pages, 3 figure
Automated Clinical Coding:What, Why, and Where We Are?
Clinical coding is the task of transforming medical information in a
patient's health records into structured codes so that they can be used for
statistical analysis. This is a cognitive and time-consuming task that follows
a standard process in order to achieve a high level of consistency. Clinical
coding could potentially be supported by an automated system to improve the
efficiency and accuracy of the process. We introduce the idea of automated
clinical coding and summarise its challenges from the perspective of Artificial
Intelligence (AI) and Natural Language Processing (NLP), based on the
literature, our project experience over the past two and half years (late 2019
- early 2022), and discussions with clinical coding experts in Scotland and the
UK. Our research reveals the gaps between the current deep learning-based
approach applied to clinical coding and the need for explainability and
consistency in real-world practice. Knowledge-based methods that represent and
reason the standard, explainable process of a task may need to be incorporated
into deep learning-based methods for clinical coding. Automated clinical coding
is a promising task for AI, despite the technical and organisational
challenges. Coders are needed to be involved in the development process. There
is much to achieve to develop and deploy an AI-based automated system to
support coding in the next five years and beyond.Comment: accepted for npj Digital Medicin
Automated clinical coding:What, why, and where we are?
Funding Information: The work is supported by WellCome Trust iTPA Awards (PIII009, PIII032), Health Data Research UK National Phenomics and Text Analytics Implementation Projects, and the United Kingdom Research and Innovation (grant EP/S02431X/1), UKRI Centre for Doctoral Training in Biomedical AI at the University of Edinburgh, School of Informatics. H.D. and J.C. are supported by the Engineering and Physical Sciences Research Council (EP/V050869/1) on âConCur: Knowledge Base Construction and Curationâ. HW was supported by Medical Research Council and Health Data Research UK (MR/S004149/1, MR/S004149/2); British Council (UCL-NMU-SEU international collaboration on Artificial Intelligence in Medicine: tackling challenges of low generalisability and health inequality); National Institute for Health Research (NIHR202639); Advanced Care Research Centre at the University of Edinburgh. We thank constructive comments from Murray Bell and Janice Watson in Terminology Service in Public Health Scotland, and information provided by Allison Reid in the coding department in NHS Lothian, Paul Mitchell, Nicola Symmers, and Barry Hewit in Edinburgh Cancer Informatics, and staff in Epic Systems Corporation. Thanks for the suggestions from Dr. Emma Davidson regarding clinical research. Thanks to the discussions with Dr. Kristiina RannikmĂ€e regarding the research on clinical coding and with Ruohua Han regarding the social and qualitative aspects of this research. In Fig. , the icon of âClinical Codersâ was from Freepik in Flaticon, https://www.flaticon.com/free-icon/user_747376 ; the icon of âAutomated Coding Systemâ was from Free Icon Library, https://icon-library.com/png/272370.html . Funding Information: The work is supported by WellCome Trust iTPA Awards (PIII009, PIII032), Health Data Research UK National Phenomics and Text Analytics Implementation Projects, and the United Kingdom Research and Innovation (grant EP/S02431X/1), UKRI Centre for Doctoral Training in Biomedical AI at the University of Edinburgh, School of Informatics. H.D. and J.C. are supported by the Engineering and Physical Sciences Research Council (EP/V050869/1) on âConCur: Knowledge Base Construction and Curationâ. HW was supported by Medical Research Council and Health Data Research UK (MR/S004149/1, MR/S004149/2); British Council (UCL-NMU-SEU international collaboration on Artificial Intelligence in Medicine: tackling challenges of low generalisability and health inequality); National Institute for Health Research (NIHR202639); Advanced Care Research Centre at the University of Edinburgh. We thank constructive comments from Murray Bell and Janice Watson in Terminology Service in Public Health Scotland, and information provided by Allison Reid in the coding department in NHS Lothian, Paul Mitchell, Nicola Symmers, and Barry Hewit in Edinburgh Cancer Informatics, and staff in Epic Systems Corporation. Thanks for the suggestions from Dr. Emma Davidson regarding clinical research. Thanks to the discussions with Dr. Kristiina RannikmĂ€e regarding the research on clinical coding and with Ruohua Han regarding the social and qualitative aspects of this research. In Fig. 1 , the icon of âClinical Codersâ was from Freepik in Flaticon, https://www.flaticon.com/free-icon/user_747376 ; the icon of âAutomated Coding Systemâ was from Free Icon Library, https://icon-library.com/png/272370.html. Publisher Copyright: © 2022, The Author(s).Clinical coding is the task of transforming medical information in a patientâs health records into structured codes so that they can be used for statistical analysis. This is a cognitive and time-consuming task that follows a standard process in order to achieve a high level of consistency. Clinical coding could potentially be supported by an automated system to improve the efficiency and accuracy of the process. We introduce the idea of automated clinical coding and summarise its challenges from the perspective of Artificial Intelligence (AI) and Natural Language Processing (NLP), based on the literature, our project experience over the past two and half years (late 2019âearly 2022), and discussions with clinical coding experts in Scotland and the UK. Our research reveals the gaps between the current deep learning-based approach applied to clinical coding and the need for explainability and consistency in real-world practice. Knowledge-based methods that represent and reason the standard, explainable processof a task may need to be incorporated into deep learning-based methods for clinical coding. Automated clinical coding is a promising task for AI, despite the technical and organisational challenges. Coders are needed to be involved in the development process. There is much to achieve to develop and deploy an AI-based automated system to support coding in the next five years and beyond.Peer reviewe
Experimental Evaluation and Development of a Silver-Standard for the MIMIC-III Clinical Coding Dataset
Clinical coding is currently a labour-intensive, error-prone, but critical
administrative process whereby hospital patient episodes are manually assigned
codes by qualified staff from large, standardised taxonomic hierarchies of
codes. Automating clinical coding has a long history in NLP research and has
recently seen novel developments setting new state of the art results. A
popular dataset used in this task is MIMIC-III, a large intensive care database
that includes clinical free text notes and associated codes. We argue for the
reconsideration of the validity MIMIC-III's assigned codes that are often
treated as gold-standard, especially when MIMIC-III has not undergone secondary
validation. This work presents an open-source, reproducible experimental
methodology for assessing the validity of codes derived from EHR discharge
summaries. We exemplify the methodology with MIMIC-III discharge summaries and
show the most frequently assigned codes in MIMIC-III are under-coded up to 35%
Experimental Evaluation and Development of a Silver-Standard for the MIMIC-III Clinical Coding Dataset
Clinical coding is currently a labour-intensive,
error-prone, but critical administrative process
whereby hospital patient episodes are manually assigned codes by qualified staff from
large, standardised taxonomic hierarchies of
codes. Automating clinical coding has a long
history in NLP research and has recently seen
novel developments setting new state of the art
results. A popular dataset used in this task is
MIMIC-III, a large intensive care database that
includes clinical free text notes and associated
codes. We argue for the reconsideration of the
validity MIMIC-IIIâs assigned codes that are
often treated as gold-standard, especially when
MIMIC-III has not undergone secondary validation. This work presents an open-source, reproducible experimental methodology for assessing the validity of codes derived from
EHR discharge summaries. We exemplify the
methodology with MIMIC-III discharge summaries and show the most frequently assigned
codes in MIMIC-III are under-coded up to
35%
Multimodal Machine Learning for Automated ICD Coding
This study presents a multimodal machine learning model to predict ICD-10
diagnostic codes. We developed separate machine learning models that can handle
data from different modalities, including unstructured text, semi-structured
text and structured tabular data. We further employed an ensemble method to
integrate all modality-specific models to generate ICD-10 codes. Key evidence
was also extracted to make our prediction more convincing and explainable. We
used the Medical Information Mart for Intensive Care III (MIMIC -III) dataset
to validate our approach. For ICD code prediction, our best-performing model
(micro-F1 = 0.7633, micro-AUC = 0.9541) significantly outperforms other
baseline models including TF-IDF (micro-F1 = 0.6721, micro-AUC = 0.7879) and
Text-CNN model (micro-F1 = 0.6569, micro-AUC = 0.9235). For interpretability,
our approach achieves a Jaccard Similarity Coefficient (JSC) of 0.1806 on text
data and 0.3105 on tabular data, where well-trained physicians achieve 0.2780
and 0.5002 respectively.Comment: Machine Learning for Healthcare 201
Exploring the Consistency, Quality and Challenges in Manual and Automated Coding of Free-text Diagnoses from Hospital Outpatient Letters
Coding of unstructured clinical free-text to produce interoperable structured
data is essential to improve direct care, support clinical communication and to
enable clinical research.However, manual clinical coding is difficult and time
consuming, which motivates the development and use of natural language
processing for automated coding. This work evaluates the quality and
consistency of both manual and automated clinical coding of diagnoses from
hospital outpatient letters. Using 100 randomly selected letters, two human
clinicians performed coding of diagnosis lists to SNOMED CT. Automated coding
was also performed using IMO's Concept Tagger. A gold standard was constructed
by a panel of clinicians from a subset of the annotated diagnoses. This was
used to evaluate the quality and consistency of both manual and automated
coding via (1) a distance-based metric, treating SNOMED CT as a graph, and (2)
a qualitative metric agreed upon by the panel of clinicians. Correlation
between the two metrics was also evaluated. Comparing human and
computer-generated codes to the gold standard, the results indicate that humans
slightly out-performed automated coding, while both performed notably better
when there was only a single diagnosis contained in the free-text description.
Automated coding was considered acceptable by the panel of clinicians in
approximately 90% of cases
Intelligent audit code generation from free text in the context of neurosurgery
Clinical auditing requires codified data for aggregation and analysis of patterns. However in the medical domain obtaining structured data can be difficult as the most natural, expressive and comprehensive way to record a clinical encounter is through natural language. The task of creating structured data from naturally expressed information is known as information extraction. Specialised areas of medicine use their own language and data structures; the translation process has unique challenges, and often requires a fresh approach. This research is devoted to creating a novel semi-automated method for generating codified auditing data from clinical notes recorded in a neurosurgical department in an Australian teaching hospital. The method encapsulates specialist knowledge in rules that instantaneously make precise decisions for the majority of the matches, followed up by dictionary-based matching of the remaining text
Computer-Assisted Coding: Post ICD-10 Implementation
Computer-assisted coding (CAC) has been around since the 1950s and is projecting to reach $4.75 Billion by 2022. However, it has not been on the hospitalsâ priority list until 2014 before the implementation of ICD-10 in 2015. Computer-assisted coding is a technology software that helps streamline the coding workflow, reduce backlogs by increasing productivity, and help coders navigate through more extended, more complex charts more quickly. The technology is a type of artificial intelligence. The idea of computer-assisted became more front-line with the implementation of electronic health records (EHRs) and the demands of a more restrictive reimbursement from payers. Accuracy, consistency, and, most assuredly, productivity has been of great importance to all organizations. Due to the increase in advanced technologies, computer-assisted coding has advanced in its performance. However, the question remains as to if it has lived up to the recent hype before the implementation of ICD-10 to increase productivity, accuracy, consistency, improve clinical documentation, etc. This study was conducted using a questionnaire to survey the Tennessee Health Information Management (THIMA) community members as to the effectiveness of computer-assisted coding five years after the implementation of ICD-10. The results of the survey show that there are organizations that are still not using CAC. The overall perception of the respondents feel CAC is not a must-have technology to code efficiently but, with the CAC, the overall coding process is satisfactory but still needs improvement
- âŠ