3,611 research outputs found
Multimodal Machine Learning for Automated ICD Coding
This study presents a multimodal machine learning model to predict ICD-10
diagnostic codes. We developed separate machine learning models that can handle
data from different modalities, including unstructured text, semi-structured
text and structured tabular data. We further employed an ensemble method to
integrate all modality-specific models to generate ICD-10 codes. Key evidence
was also extracted to make our prediction more convincing and explainable. We
used the Medical Information Mart for Intensive Care III (MIMIC -III) dataset
to validate our approach. For ICD code prediction, our best-performing model
(micro-F1 = 0.7633, micro-AUC = 0.9541) significantly outperforms other
baseline models including TF-IDF (micro-F1 = 0.6721, micro-AUC = 0.7879) and
Text-CNN model (micro-F1 = 0.6569, micro-AUC = 0.9235). For interpretability,
our approach achieves a Jaccard Similarity Coefficient (JSC) of 0.1806 on text
data and 0.3105 on tabular data, where well-trained physicians achieve 0.2780
and 0.5002 respectively.Comment: Machine Learning for Healthcare 201
HiCu: Leveraging Hierarchy for Curriculum Learning in Automated ICD Coding
There are several opportunities for automation in healthcare that can improve
clinician throughput. One such example is assistive tools to document diagnosis
codes when clinicians write notes. We study the automation of medical code
prediction using curriculum learning, which is a training strategy for machine
learning models that gradually increases the hardness of the learning tasks
from easy to difficult. One of the challenges in curriculum learning is the
design of curricula -- i.e., in the sequential design of tasks that gradually
increase in difficulty. We propose Hierarchical Curriculum Learning (HiCu), an
algorithm that uses graph structure in the space of outputs to design curricula
for multi-label classification. We create curricula for multi-label
classification models that predict ICD diagnosis and procedure codes from
natural language descriptions of patients. By leveraging the hierarchy of ICD
codes, which groups diagnosis codes based on various organ systems in the human
body, we find that our proposed curricula improve the generalization of neural
network-based predictive models across recurrent, convolutional, and
transformer-based architectures. Our code is available at
https://github.com/wren93/HiCu-ICD.Comment: To appear at Machine Learning for Healthcare Conference (MLHC2022
Machine Learning and Clinical Text. Supporting Health Information Flow
Fluent health information flow is critical for clinical decision-making. However, a considerable part of this information is free-form text and inabilities to utilize it create risks to patient safety and cost-Âeffective hospital administration. Methods for automated processing of clinical text are emerging.
The aim in this doctoral dissertation is to study machine learning and clinical text in order to support health information flow.First, by analyzing the content of authentic patient records, the aim is to specify clinical needs in order to guide the development of machine learning applications.The contributions are a model of the ideal information flow,a model of the problems and challenges in reality,
and a road map for the technology development.
Second, by developing applications for practical cases,the aim is to concretize ways to support health information flow.
Altogether five machine learning applications for three practical cases are described: The first two applications are binary classification and regression related to the practical case of topic labeling and relevance ranking.The third and fourth application are supervised and unsupervised multi-class classification for the practical case of topic segmentation and labeling.These four applications are tested with Finnish intensive care patient records.The fifth application is multi-label classification for the practical task of diagnosis coding. It is tested with English radiology reports.The performance of all these applications is promising.
Third, the aim is to study how the quality of machine learning applications can be reliably evaluated.The associations between performance evaluation measures and methods are addressed,and a new hold-out method is introduced.This method contributes not only to processing time but also to the evaluation diversity and quality.
The main conclusion is that developing machine learning applications for text requires interdisciplinary, international collaboration. Practical cases are very different, and hence the development must begin from genuine user needs and domain expertise. The technological expertise must cover linguistics,machine learning, and information systems. Finally, the methods must be evaluated both statistically and through authentic user-feedback.Siirretty Doriast
Extreme multi-label deep neural classification of Spanish health records according to the International Classification of Diseases
111 p.Este trabajo trata sobre la minerÃa de textos clÃnicos, un campo del Procesamiento del Lenguaje Natural aplicado al dominio biomédico. El objetivo es automatizar la tarea de codificación médica. Los registros electrónicos de salud (EHR) son documentos que contienen información clÃnica sobre la salud de unpaciente. Los diagnósticos y procedimientos médicos plasmados en la Historia ClÃnica Electrónica están codificados con respecto a la Clasificación Internacional de Enfermedades (CIE). De hecho, la CIE es la base para identificar estadÃsticas de salud internacionales y el estándar para informar enfermedades y condiciones de salud. Desde la perspectiva del aprendizaje automático, el objetivo es resolver un problema extremo de clasificación de texto de múltiples etiquetas, ya que a cada registro de salud se le asignan múltiples códigos ICD de un conjunto de más de 70 000 términos de diagnóstico. Una cantidad importante de recursos se dedican a la codificación médica, una laboriosa tarea que actualmente se realiza de forma manual. Los EHR son narraciones extensas, y los codificadores médicos revisan los registros escritos por los médicos y asignan los códigos ICD correspondientes. Los textos son técnicos ya que los médicos emplean una jerga médica especializada, aunque rica en abreviaturas, acrónimos y errores ortográficos, ya que los médicos documentan los registros mientras realizan la práctica clÃnica real. Paraabordar la clasificación automática de registros de salud, investigamos y desarrollamos un conjunto de técnicas de clasificación de texto de aprendizaje profundo
Supporting the Billing Process in Outpatient Medical Care: Automated Medical Coding Through Machine Learning
Reimbursement in medical care implies significant administrative effort for medical staff. To bill the treatments or services provided, diagnosis and treatment codes must be assigned to patient records using standardized healthcare classification systems, which is a time-consuming and error-prone task. In contrast to ICD diagnosis codes used in most countries for inpatient care reimbursement, outpatient medical care often involves different reimbursement schemes. Following the Action Design Research methodology, we developed an NLP-based machine learning artifact in close collaboration with a general practitioner’s office in Germany, leveraging a dataset of over 5,600 patients with more than 63,000 billing codes. For the code prediction of most problematic treatments as well as a complete code prediction task, we achieved F1-scores of 93.60 % and 78.22 %, respectively. Throughout three iterations, we derived five meta requirements leading to three design principles for an automated coding system to support the reimbursement of outpatient medical care
Deep Neural Networks for Multi-Label Text Classification: Application to Coding Electronic Medical Records
Coding Electronic Medical Records (EMRs) with diagnosis and procedure codes is an essential task for billing, secondary data analyses, and monitoring health trends. Both speed and accuracy of coding are critical. While coding errors could lead to more patient-side financial burden and misinterpretation of a patient’s well-being, timely coding is also needed to avoid backlogs and additional costs for the healthcare facility. Therefore, it is necessary to develop automated diagnosis and procedure code recommendation methods that can be used by professional medical coders.
The main difficulty with developing automated EMR coding methods is the nature of the label space. The standardized vocabularies used for medical coding contain over 10 thousand codes. The label space is large, and the label distribution is extremely unbalanced - most codes occur very infrequently, with a few codes occurring several orders of magnitude more than others. A few codes never occur in training dataset at all.
In this work, we present three methods to handle the large unbalanced label space. First, we study how to augment EMR training data with biomedical data (research articles indexed on PubMed) to improve the performance of standard neural networks for text classification. PubMed indexes more than 23 million citations. Many of the indexed articles contain relevant information about diagnosis and procedure codes. Therefore, we present a novel method of incorporating this unstructured data in PubMed using transfer learning. Second, we combine ideas from metric learning with recent advances in neural networks to form a novel neural architecture that better handles infrequent codes. And third, we present new methods to predict codes that have never appeared in the training dataset. Overall, our contributions constitute advances in neural multi-label text classification with potential consequences for improving EMR coding
A Label Attention Model for ICD Coding from Clinical Text
ICD coding is a process of assigning the International Classification of
Disease diagnosis codes to clinical/medical notes documented by health
professionals (e.g. clinicians). This process requires significant human
resources, and thus is costly and prone to error. To handle the problem,
machine learning has been utilized for automatic ICD coding. Previous
state-of-the-art models were based on convolutional neural networks, using a
single/several fixed window sizes. However, the lengths and interdependence
between text fragments related to ICD codes in clinical text vary
significantly, leading to the difficulty of deciding what the best window sizes
are. In this paper, we propose a new label attention model for automatic ICD
coding, which can handle both the various lengths and the interdependence of
the ICD code related text fragments. Furthermore, as the majority of ICD codes
are not frequently used, leading to the extremely imbalanced data issue, we
additionally propose a hierarchical joint learning mechanism extending our
label attention model to handle the issue, using the hierarchical relationships
among the codes. Our label attention model achieves new state-of-the-art
results on three benchmark MIMIC datasets, and the joint learning mechanism
helps improve the performances for infrequent codes.Comment: In Proceedings of IJCAI 2020 (Main Track
Domain-specific word embeddings for ICD-9-CM classification
In this work we evaluate domain-speci�c embedding models induced from textual resources
in the medical domain. The International Classi�cation of Diseases (ICD) is a
standard, broadly used classi�cation system, that codes a large number of speci�c diseases,
symptoms, injuries and medical procedures into numerical classes. Assigning a code to a
clinical case means classifying that case into one or more particular discrete class, hence
allowing further statistics studies and automated calculations. The possibility to have a
discrete code instead of a text in natural language is intuitively a great advantage for data
processing systems. The use of such classi�cation is becoming increasingly important
for, but not limited to, economic and policy-making purposes. Experiments show that
domain-speci�c word embeddings, instead of a general one, improves classi�ers in terms
of frequency similarities between words
Natural language processing of MIMIC-III clinical notes for identifying diagnosis and procedures with neural networks
Coding diagnosis and procedures in medical records is a crucial process in
the healthcare industry, which includes the creation of accurate billings,
receiving reimbursements from payers, and creating standardized patient care
records. In the United States, Billing and Insurance related activities cost
around $471 billion in 2012 which constitutes about 25% of all the U.S hospital
spending. In this paper, we report the performance of a natural language
processing model that can map clinical notes to medical codes, and predict
final diagnosis from unstructured entries of history of present illness,
symptoms at the time of admission, etc. Previous studies have demonstrated that
deep learning models perform better at such mapping when compared to
conventional machine learning models. Therefore, we employed state-of-the-art
deep learning method, ULMFiT on the largest emergency department clinical notes
dataset MIMIC III which has 1.2M clinical notes to select for the top-10 and
top-50 diagnosis and procedure codes. Our models were able to predict the
top-10 diagnoses and procedures with 80.3% and 80.5% accuracy, whereas the
top-50 ICD-9 codes of diagnosis and procedures are predicted with 70.7% and
63.9% accuracy. Prediction of diagnosis and procedures from unstructured
clinical notes benefit human coders to save time, eliminate errors and minimize
costs. With promising scores from our present model, the next step would be to
deploy this on a small-scale real-world scenario and compare it with human
coders as the gold standard. We believe that further research of this approach
can create highly accurate predictions that can ease the workflow in a clinical
setting.Comment: This is a shortened version of the Capstone Project that was accepted
by the Faculty of Indiana University, in partial fulfillment of the
requirements for the degree of Master of Science in Health Informatics in Dec
201
- …