1,170 research outputs found

    Deep Neural Networks for Multi-Label Text Classification: Application to Coding Electronic Medical Records

    Get PDF
    Coding Electronic Medical Records (EMRs) with diagnosis and procedure codes is an essential task for billing, secondary data analyses, and monitoring health trends. Both speed and accuracy of coding are critical. While coding errors could lead to more patient-side financial burden and misinterpretation of a patient’s well-being, timely coding is also needed to avoid backlogs and additional costs for the healthcare facility. Therefore, it is necessary to develop automated diagnosis and procedure code recommendation methods that can be used by professional medical coders. The main difficulty with developing automated EMR coding methods is the nature of the label space. The standardized vocabularies used for medical coding contain over 10 thousand codes. The label space is large, and the label distribution is extremely unbalanced - most codes occur very infrequently, with a few codes occurring several orders of magnitude more than others. A few codes never occur in training dataset at all. In this work, we present three methods to handle the large unbalanced label space. First, we study how to augment EMR training data with biomedical data (research articles indexed on PubMed) to improve the performance of standard neural networks for text classification. PubMed indexes more than 23 million citations. Many of the indexed articles contain relevant information about diagnosis and procedure codes. Therefore, we present a novel method of incorporating this unstructured data in PubMed using transfer learning. Second, we combine ideas from metric learning with recent advances in neural networks to form a novel neural architecture that better handles infrequent codes. And third, we present new methods to predict codes that have never appeared in the training dataset. Overall, our contributions constitute advances in neural multi-label text classification with potential consequences for improving EMR coding

    A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources

    Full text link
    Electronic Health Records (EHRs) are a valuable asset to facilitate clinical research and point of care applications; however, many challenges such as data privacy concerns impede its optimal utilization. Deep generative models, particularly, Generative Adversarial Networks (GANs) show great promise in generating synthetic EHR data by learning underlying data distributions while achieving excellent performance and addressing these challenges. This work aims to review the major developments in various applications of GANs for EHRs and provides an overview of the proposed methodologies. For this purpose, we combine perspectives from healthcare applications and machine learning techniques in terms of source datasets and the fidelity and privacy evaluation of the generated synthetic datasets. We also compile a list of the metrics and datasets used by the reviewed works, which can be utilized as benchmarks for future research in the field. We conclude by discussing challenges in GANs for EHRs development and proposing recommended practices. We hope that this work motivates novel research development directions in the intersection of healthcare and machine learning

    Deep learning for ICD coding: Looking for medical concepts in clinical documents in english and in French

    Get PDF
    © Springer Nature Switzerland AG 2018. Medical Concept Coding (MCD) is a crucial task in biomedical information extraction. Recent advances in neural network modeling have demonstrated its usefulness in the task of natural language processing. Modern framework of sequence-to-sequence learning that was initially used for recurrent neural networks has been shown to provide powerful solution to tasks such as Named Entity Recognition or Medical Concept Coding. We have addressed the identification of clinical concepts within the International Classification of Diseases version 10 (ICD-10) in two benchmark data sets of death certificates provided for the task 1 in the CLEF eHealth shared task 2017. A proposed architecture combines ideas from recurrent neural networks and traditional text retrieval term weighting schemes. We found that our models reach accuracy of 75% and 86% as evaluated by the F-measure on the CépiDc corpus of French texts and on the CDC corpus of English texts, respectfully. The proposed models can be employed for coding electronic medical records with ICD codes including diagnosis and procedure codes

    A survey of generative adversarial networks for synthesizing structured electronic health records

    Get PDF
    Electronic Health Records (EHRs) are a valuable asset to facilitate clinical research and point of care applications; however, many challenges such as data privacy concerns impede its optimal utilization. Deep generative models, particularly, Generative Adversarial Networks (GANs) show great promise in generating synthetic EHR data by learning underlying data distributions while achieving excellent performance and addressing these challenges. This work aims to survey the major developments in various applications of GANs for EHRs and provides an overview of the proposed methodologies. For this purpose, we combine perspectives from healthcare applications and machine learning techniques in terms of source datasets and the fidelity and privacy evaluation of the generated synthetic datasets. We also compile a list of the metrics and datasets used by the reviewed works, which can be utilized as benchmarks for future research in the field. We conclude by discussing challenges in GANs for EHRs development and proposing recommended practices. We hope that this work motivates novel research development directions in the intersection of healthcare and machine learning

    Using machine learning for automated de-identification and clinical coding of free text data in electronic medical records

    Full text link
    The widespread adoption of Electronic Medical Records (EMRs) in hospitals continues to increase the amount of patient data that are digitally stored. Although the primary use of the EMR is to support patient care by making all relevant information accessible, governments and health organisations are looking for ways to unleash the potential of these data for secondary purposes, including clinical research, disease surveillance and automation of healthcare processes and workflows. EMRs include large quantities of free text documents that contain valuable information. The greatest challenges in using the free text data in EMRs include the removal of personally identifiable information and the extraction of relevant information for specific tasks such as clinical coding. Machine learning-based automated approaches can potentially address these challenges. This thesis aims to explore and improve the performance of machine learning models for automated de-identification and clinical coding of free text data in EMRs, as captured in hospital discharge summaries, and facilitate the applications of these approaches in real-world use cases. It does so by 1) implementing an end-to-end de-identification framework using an ensemble of deep learning models; 2) developing a web-based system for de-identification of free text (DEFT) with an interactive learning loop; 3) proposing and implementing a hierarchical label-wise attention transformer model (HiLAT) for explainable International Classification of Diseases (ICD) coding; and 4) investigating the use of extreme multi-label long text transformer-based models for automated ICD coding. The key findings include: 1) An end-to-end framework using an ensemble of deep learning base-models achieved excellent performance on the de-identification task. 2) A new web-based de-identification software system (DEFT) can be readily and easily adopted by data custodians and researchers to perform de-identification of free text in EMRs. 3) A novel domain-specific transformer-based model (HiLAT) achieved state-of-the-art (SOTA) results for predicting ICD codes on a Medical Information Mart for Intensive Care (MIMIC-III) dataset comprising the discharge summaries (n=12,808) that are coded with at least one of the most 50 frequent diagnosis and procedure codes. In addition, the label-wise attention scores for the tokens in the discharge summary presented a potential explainability tool for checking the face validity of ICD code predictions. 4) An optimised transformer-based model, PLM-ICD, achieved the latest SOTA results for ICD coding on all the discharge summaries of the MIMIC-III dataset (n=59,652). The segmentation method, which split the long text consecutively into multiple small chunks, addressed the problem of applying transformer-based models to long text datasets. However, using transformer-based models on extremely large label sets needs further research. These findings demonstrate that the de-identification and clinical coding tasks can benefit from the application of machine learning approaches, present practical tools for implementing these approaches, and highlight priorities for further research

    The adoption of ICT in Malaysian public hospitals: the interoperability of electronic health records and health information systems

    Get PDF
    There have been a number of researches that investigated ICT adoption in Malaysian healthcare. With the small number of hospitals that adopt ICT in their daily clinical and administrative operations, the possibility to enable data exchange across 131 public hospitals in Malaysia is still a long journey. In addition to those studies, this research was framed under six objectives, which aim to critically review existing literature on the subject matter, identify barriers of ICT adoption in Malaysia, understand the administrative context during the pre and post-ICT adoption, and recommend possible solutions to the Ministry of Health of Malaysia (MoHM) in its efforts to implement interoperable electronic health records (EHR) and health information systems (HTIS). Specifically, this research aimed to identify the factors that had significant impacts to the processes of implementing interoperable EHR and HTIS by the MoHM. Furthermore, it also aimed to propose relevant actors who should involve in the implementation phases. These factors and actors were used to develop a model for implementing interoperable EHR and HTIS in Malaysia. To gather the needed data, series of interviews were conducted with three groups of participants. They were ICT administrators of MoHM, ICT and medical record administrators of three hospitals, and physicians of three hospitals. To ensure the interview feedback was representing the context of EHR and HTIS implementation in Malaysia, two hospital categories were selected, which included the hospitals with HTIS and non-HTIS hospitals. The government documents were then used to triangulate the feedback to ensure dependability, credibility, transferability and conformity of the findings. Two techniques were used to analyse the data, which were thematic analysis and theme matching. These two techniques were modified from its original method, known as pattern matching. The originality of this research was presented in the findings and methods to transform them into solutions and provide recommendation to the MoHM. In general, the results showed that the technological factors contributed less to the success of the implementation of interoperable EHR and HTIS compared to the managerial and administrative factors. Four main practical and social contributions were identified from this research, which included synchronisation of managerial elements, political determination and change management transformation, optimisation of use of existing legacy system (Patient Management System) and finally the roles of actors. Nevertheless, the findings of this research would be more dependable and transferable if more participants had been willing to participate especially among the physicians and those who managed the ICT adoptions under the MoHM

    Applied Deep Learning: Case Studies in Computer Vision and Natural Language Processing

    Get PDF
    Deep learning has proved to be successful for many computer vision and natural language processing applications. In this dissertation, three studies have been conducted to show the efficacy of deep learning models for computer vision and natural language processing. In the first study, an efficient deep learning model was proposed for seagrass scar detection in multispectral images which produced robust, accurate scars mappings. In the second study, an arithmetic deep learning model was developed to fuse multi-spectral images collected at different times with different resolutions to generate high-resolution images for downstream tasks including change detection, object detection, and land cover classification. In addition, a super-resolution deep model was implemented to further enhance remote sensing images. In the third study, a deep learning-based framework was proposed for fact-checking on social media to spot fake scientific news. The framework leveraged deep learning, information retrieval, and natural language processing techniques to retrieve pertinent scholarly papers for given scientific news and evaluate the credibility of the news
    corecore