442 research outputs found

    Enhancing Drug Overdose Mortality Surveillance through Natural Language Processing and Machine Learning

    Get PDF
    Epidemiological surveillance is key to monitoring and assessing the health of populations. Drug overdose surveillance has become an increasingly important part of public health practice as overdose morbidity and mortality has increased due in large part to the opioid crisis. Monitoring drug overdose mortality relies on death certificate data, which has several limitations including timeliness and the coding structure used to identify specific substances that caused death. These limitations stem from the need to analyze the free-text cause-of-death sections of the death certificate that are completed by the medical certifier during death investigation. Other fields, including clinical sciences, have utilized natural language processing (NLP) methods to gain insight from free-text data, but thus far, adoption of NLP methods in epidemiological surveillance has been limited. Through a narrative review of NLP methods currently used in public health surveillance and the integration of two NLP tasks, classification and named entity recognition, this dissertation enhances the capabilities of public health practitioners and researchers to perform drug overdose mortality surveillance. This dissertation advances both surveillance science and public health practice by integrating methods from bioinformatics into the surveillance pipeline which provides more timely and increased quality overdose mortality surveillance, which is essential to guiding effective public health response to the continuing drug overdose epidemic

    Data Mining Pipeline for Performing Decision Tree Analysis On Mortality Dataset With ICD-10 Codes

    Get PDF
    Modernization of the healthcare sector has led to the introduction of wider and newer varieties of medical devices in hospitals. Consequently, there are increasing numbers of infectious complications related to medical devices. However, managing and monitoring the risk of medical devices are difficult and costly. The hospitals and the healthcare device service providers require effective means to manage the healthcare device maintenance to provide better patient care. To address this issue, we propose a data mining pipeline to classify medical devices based on mortality rates and ICD-10 codes. We utilize the decision tree grouping method to build a connection between the mortality dataset and ICD-10 codes. We anticipate that the results of this study will assist with healthcare providers identify risks associated with medical devices based on how many deaths are caused due to the improper use or use of faulty medical instruments during the treatment

    Multimodal Machine Learning for Automated ICD Coding

    Full text link
    This study presents a multimodal machine learning model to predict ICD-10 diagnostic codes. We developed separate machine learning models that can handle data from different modalities, including unstructured text, semi-structured text and structured tabular data. We further employed an ensemble method to integrate all modality-specific models to generate ICD-10 codes. Key evidence was also extracted to make our prediction more convincing and explainable. We used the Medical Information Mart for Intensive Care III (MIMIC -III) dataset to validate our approach. For ICD code prediction, our best-performing model (micro-F1 = 0.7633, micro-AUC = 0.9541) significantly outperforms other baseline models including TF-IDF (micro-F1 = 0.6721, micro-AUC = 0.7879) and Text-CNN model (micro-F1 = 0.6569, micro-AUC = 0.9235). For interpretability, our approach achieves a Jaccard Similarity Coefficient (JSC) of 0.1806 on text data and 0.3105 on tabular data, where well-trained physicians achieve 0.2780 and 0.5002 respectively.Comment: Machine Learning for Healthcare 201

    Enhancing Automatic ICD-9-CM Code Assignment for Medical Texts with PubMed

    Get PDF
    Assigning a standard ICD-9-CM code to disease symptoms in medical texts is an important task in the medical domain. Automating this process could greatly reduce the costs. However, the effectiveness of an automatic ICD-9-CM code classifier faces a serious problem, which can be triggered by unbalanced training data. Frequent diseases often have more training data, which helps its classification to perform better than that of an infrequent disease. However, a disease’s frequency does not necessarily reflect its importance. To resolve this training data shortage problem, we propose to strategically draw data from PubMed to enrich the training data when there is such need. We validate our method on the CMC dataset, and the evaluation results indicate that our method can significantly improve the code assignment classifiers' performance at the macro-averaging level

    Nordic Cancer Registries - an overview of their procedures and data comparability

    Get PDF
    Background: The Nordic Cancer Registries are among the oldest population-based registries in the world, with more than 60 years of complete coverage of what is now a combined population of 26 million. However, despite being the source of a substantial number of studies, there is no published paper comparing the different registries. Therefore, we did a systematic review to identify similarities and dissimilarities of the Nordic Cancer Registries, which could possibly explain some of the differences in cancer incidence rates across these countries.Methods: We describe and compare here the core characteristics of each of the Nordic Cancer Registries: (i) data sources; (ii) registered disease entities and deviations from IARC multiple cancer coding rules; (iii) variables and related coding systems. Major changes over time are described and discussed.Results: All Nordic Cancer Registries represent a high quality standard in terms of completeness and accuracy of the registered data.Conclusions: Even though the information in the Nordic Cancer Registries in general can be considered more similar than any other collection of data from five different countries, there are numerous differences in registration routines, classification systems and inclusion of some tumors. These differences are important to be aware of when comparing time trends in the Nordic countries.Peer reviewe

    Distributed knowledge based clinical auto-coding system

    Get PDF
    Codification of free-text clinical narratives have long been recognised to be beneficial for secondary uses such as funding, insurance claim processing and research. In recent years, many researchers have studied the use of Natural Language Processing (NLP), related Machine Learning (ML) methods and techniques to resolve the problem of manual coding of clinical narratives. Most of the studies are focused on classification systems relevant to the U.S and there is a scarcity of studies relevant to Australian classification systems such as ICD- 10-AM and ACHI. Therefore, we aim to develop a knowledge-based clinical auto-coding system, that utilise appropriate NLP and ML techniques to assign ICD-10-AM and ACHI codes to clinical records, while adhering to both local coding standards (Australian Coding Standard) and international guidelines that get updated and validated continuously

    Classification of Cancer-related Death Certificates using Machine Learning

    Get PDF
    BackgroundCancer monitoring and prevention relies on the critical aspect of timely notification of cancer cases. However, the abstraction and classification of cancer from the free-text of pathology reports and other relevant documents, such as death certificates, exist as complex and time-consuming activities.AimsIn this paper, approaches for the automatic detection of notifiable cancer cases as the cause of death from free-text death certificates supplied to Cancer Registries are investigated.Method  A number of machine learning classifiers were studied. Features were extracted using natural language techniques and the Medtex toolkit. The numerous features encompassed stemmed words, bi-grams, and concepts from the SNOMED CT medical terminology. The baseline consisted of a keyword spotter using keywords extracted from the long description of ICD-10 cancer related codes.ResultsDeath certificates with notifiable cancer listed as the cause of death can be effectively identified with the methods studied in this paper. A Support Vector Machine (SVM) classifier achieved best performance with an overall F-measure of 0.9866 when evaluated on a set of 5,000 free-text death certificates using the token stem feature set. The SNOMED CT concept plus token stem feature set reached the lowest variance (0.0032) and false negative rate (0.0297) while achieving an F-measure of 0.9864. The SVM classifier accounts for the first 18 of the top 40 evaluated runs, and entails the most robust classifier with a variance of 0.001141, half the variance of the other classifiers.ConclusionThe selection of features significantly produced the most influences on the performance of the classifiers, although the type of classifier employed also affects performance. In contrast, the feature weighting schema created a negligible effect on performance. Specifically, it is found that stemmed tokens with or without SNOMED CT concepts create the most effective feature when combined with an SVM classifi

    Supporting the Billing Process in Outpatient Medical Care: Automated Medical Coding Through Machine Learning

    Get PDF
    Reimbursement in medical care implies significant administrative effort for medical staff. To bill the treatments or services provided, diagnosis and treatment codes must be assigned to patient records using standardized healthcare classification systems, which is a time-consuming and error-prone task. In contrast to ICD diagnosis codes used in most countries for inpatient care reimbursement, outpatient medical care often involves different reimbursement schemes. Following the Action Design Research methodology, we developed an NLP-based machine learning artifact in close collaboration with a general practitioner’s office in Germany, leveraging a dataset of over 5,600 patients with more than 63,000 billing codes. For the code prediction of most problematic treatments as well as a complete code prediction task, we achieved F1-scores of 93.60 % and 78.22 %, respectively. Throughout three iterations, we derived five meta requirements leading to three design principles for an automated coding system to support the reimbursement of outpatient medical care
    • …
    corecore