346 research outputs found

    Automated clinical coding:What, why, and where we are?

    Get PDF
    Funding Information: The work is supported by WellCome Trust iTPA Awards (PIII009, PIII032), Health Data Research UK National Phenomics and Text Analytics Implementation Projects, and the United Kingdom Research and Innovation (grant EP/S02431X/1), UKRI Centre for Doctoral Training in Biomedical AI at the University of Edinburgh, School of Informatics. H.D. and J.C. are supported by the Engineering and Physical Sciences Research Council (EP/V050869/1) on “ConCur: Knowledge Base Construction and Curation”. HW was supported by Medical Research Council and Health Data Research UK (MR/S004149/1, MR/S004149/2); British Council (UCL-NMU-SEU international collaboration on Artificial Intelligence in Medicine: tackling challenges of low generalisability and health inequality); National Institute for Health Research (NIHR202639); Advanced Care Research Centre at the University of Edinburgh. We thank constructive comments from Murray Bell and Janice Watson in Terminology Service in Public Health Scotland, and information provided by Allison Reid in the coding department in NHS Lothian, Paul Mitchell, Nicola Symmers, and Barry Hewit in Edinburgh Cancer Informatics, and staff in Epic Systems Corporation. Thanks for the suggestions from Dr. Emma Davidson regarding clinical research. Thanks to the discussions with Dr. Kristiina Rannikmäe regarding the research on clinical coding and with Ruohua Han regarding the social and qualitative aspects of this research. In Fig. , the icon of “Clinical Coders” was from Freepik in Flaticon, https://www.flaticon.com/free-icon/user_747376 ; the icon of “Automated Coding System” was from Free Icon Library, https://icon-library.com/png/272370.html . Funding Information: The work is supported by WellCome Trust iTPA Awards (PIII009, PIII032), Health Data Research UK National Phenomics and Text Analytics Implementation Projects, and the United Kingdom Research and Innovation (grant EP/S02431X/1), UKRI Centre for Doctoral Training in Biomedical AI at the University of Edinburgh, School of Informatics. H.D. and J.C. are supported by the Engineering and Physical Sciences Research Council (EP/V050869/1) on “ConCur: Knowledge Base Construction and Curation”. HW was supported by Medical Research Council and Health Data Research UK (MR/S004149/1, MR/S004149/2); British Council (UCL-NMU-SEU international collaboration on Artificial Intelligence in Medicine: tackling challenges of low generalisability and health inequality); National Institute for Health Research (NIHR202639); Advanced Care Research Centre at the University of Edinburgh. We thank constructive comments from Murray Bell and Janice Watson in Terminology Service in Public Health Scotland, and information provided by Allison Reid in the coding department in NHS Lothian, Paul Mitchell, Nicola Symmers, and Barry Hewit in Edinburgh Cancer Informatics, and staff in Epic Systems Corporation. Thanks for the suggestions from Dr. Emma Davidson regarding clinical research. Thanks to the discussions with Dr. Kristiina Rannikmäe regarding the research on clinical coding and with Ruohua Han regarding the social and qualitative aspects of this research. In Fig. 1 , the icon of “Clinical Coders” was from Freepik in Flaticon, https://www.flaticon.com/free-icon/user_747376 ; the icon of “Automated Coding System” was from Free Icon Library, https://icon-library.com/png/272370.html. Publisher Copyright: © 2022, The Author(s).Clinical coding is the task of transforming medical information in a patient’s health records into structured codes so that they can be used for statistical analysis. This is a cognitive and time-consuming task that follows a standard process in order to achieve a high level of consistency. Clinical coding could potentially be supported by an automated system to improve the efficiency and accuracy of the process. We introduce the idea of automated clinical coding and summarise its challenges from the perspective of Artificial Intelligence (AI) and Natural Language Processing (NLP), based on the literature, our project experience over the past two and half years (late 2019–early 2022), and discussions with clinical coding experts in Scotland and the UK. Our research reveals the gaps between the current deep learning-based approach applied to clinical coding and the need for explainability and consistency in real-world practice. Knowledge-based methods that represent and reason the standard, explainable processof a task may need to be incorporated into deep learning-based methods for clinical coding. Automated clinical coding is a promising task for AI, despite the technical and organisational challenges. Coders are needed to be involved in the development process. There is much to achieve to develop and deploy an AI-based automated system to support coding in the next five years and beyond.Peer reviewe

    HEALTH OUTCOME PATHWAY PREDICTION. A GRAPH-BASED FRAMEWORK

    Get PDF
    This dissertation is part of the project FrailCare.AI, which aims to detect frailty in the elderly Portuguese population in order to optimize the SNS24 (telemonitoring) service, with the goal of suggesting health pathways to reduce the patients frailty. Frailty can be defined as the condition of being weak and delicate which normally increases with age and is the consequence of several health and non-health related factors. A patient health journey is recorded in Eletronic Health Record (EHR), which are rich but sparse, noisy and multi-modal sources of truth. These can be used to train predictive models to predict future health states, where frailty is just one of them. In this work, due to lack of data access we pivoted our focus to phenotype prediction, that is, predicting diagnosis. What is more, we tackle the problem of data-insufficiency and class imbalance (e.g. rare diseases and other infrequent occurrences in the training data) by integrating standardized healthcare ontologies within graph neural networks. We study the broad task of phenotype prediction, multi-task scenarios and as well few-shot scenarios - which is when a class rarely occurs in the training set. Furthermore, during the development of this work we detect some reproducibility issues in related literature which we detail, and also open-source all of our implementations introduding a framework to aid the development of similar systems.A presente dissertação insere-se no projecto FrailCare.AI, que visa detectar a fragilidade da população idosa portuguesa com o objectivo de optimizar o serviço de telemonitoriza- ção do Sistema Nacional de Saúde Português (SNS24), e também sugerir acções a tomar para reduzir a fragilidade dos doentes. A fragilidade é uma condição de risco composta por multiplos fatores. Hoje em dia, grande parte da história clinica de cada utente é gravada digitalmente. Estes dados diversos e vastos podem ser usados treinar modelos preditivos cujo objectivo é prever futuros estados de saúde, sendo que fragilidade é só um deles. Devido à falta de accesso a dados, alteramos a tarefa principal deste trabalho para previsão de diágnosticos, onde exploramos o problema de insuficiência de dados e dese- quilíbrio de classes (por exemplo, doenças raras e outras ocorrências pouco frequentes nos dados de treino), integrando ontologias de conceitos médicos por meio de redes neu- ronais de gráfos. Exploramos também outras tarefas e o impacto que elas têm entre si. Para além disso, durante o desenvolvimento desta dissertação identificamos questões a nivel de reproducibilidade da literatura estudada, onde detalhamos e implementamos os conceitos em falta. Com o objectivo de reproducibilidade em mente, nós libertamos o nosso código, introduzindo um biblioteca que permite desenvlver sistemas semelhantes ao nosso

    HiCu: Leveraging Hierarchy for Curriculum Learning in Automated ICD Coding

    Full text link
    There are several opportunities for automation in healthcare that can improve clinician throughput. One such example is assistive tools to document diagnosis codes when clinicians write notes. We study the automation of medical code prediction using curriculum learning, which is a training strategy for machine learning models that gradually increases the hardness of the learning tasks from easy to difficult. One of the challenges in curriculum learning is the design of curricula -- i.e., in the sequential design of tasks that gradually increase in difficulty. We propose Hierarchical Curriculum Learning (HiCu), an algorithm that uses graph structure in the space of outputs to design curricula for multi-label classification. We create curricula for multi-label classification models that predict ICD diagnosis and procedure codes from natural language descriptions of patients. By leveraging the hierarchy of ICD codes, which groups diagnosis codes based on various organ systems in the human body, we find that our proposed curricula improve the generalization of neural network-based predictive models across recurrent, convolutional, and transformer-based architectures. Our code is available at https://github.com/wren93/HiCu-ICD.Comment: To appear at Machine Learning for Healthcare Conference (MLHC2022

    Natural Language Processing and Graph Representation Learning for Clinical Data

    Get PDF
    The past decade has witnessed remarkable progress in biomedical informatics and its related fields: the development of high-throughput technologies in genomics, the mass adoption of electronic health records systems, and the AI renaissance largely catalyzed by deep learning. Deep learning has played an undeniably important role in our attempts to reduce the gap between the exponentially growing amount of biomedical data and our ability to make sense of them. In particular, the two main pillars of this dissertation---natural language processing and graph representation learning---have improved our capacity to learn useful representations of language and structured data to an extent previously considered unattainable in such a short time frame. In the context of clinical data, characterized by its notorious heterogeneity and complexity, natural language processing and graph representation learning have begun to enrich our toolkits for making sense and making use of the wealth of biomedical data beyond rule-based systems or traditional regression techniques. This dissertation comes at the cusp of such a paradigm shift, detailing my journey across the fields of biomedical and clinical informatics through the lens of natural language processing and graph representation learning. The takeaway is quite optimistic: despite the many layers of inefficiencies and challenges in the healthcare ecosystem, AI for healthcare is gearing up to transform the world in new and exciting ways

    Hybrid Query Expansion on Ontology Graph in Biomedical Information Retrieval

    Get PDF
    Nowadays, biomedical researchers publish thousands of papers and journals every day. Searching through biomedical literature to keep up with the state of the art is a task of increasing difficulty for many individual researchers. The continuously increasing amount of biomedical text data has resulted in high demands for an efficient and effective biomedical information retrieval (BIR) system. Though many existing information retrieval techniques can be directly applied in BIR, BIR distinguishes itself in the extensive use of biomedical terms and abbreviations which present high ambiguity. First of all, we studied a fundamental yet simpler problem of word semantic similarity. We proposed a novel semantic word similarity algorithm and related tools called Weighted Edge Similarity Tools (WEST). WEST was motivated by our discovery that humans are more sensitive to the semantic difference due to the categorization than that due to the generalization/specification. Unlike most existing methods which model the semantic similarity of words based on either the depth of their Lowest Common Ancestor (LCA) or the traversal distance of between the word pair in WordNet, WEST also considers the joint contribution of the weighted distance between two words and the weighted depth of their LCA in WordNet. Experiments show that weighted edge based word similarity method has achieved 83.5% accuracy to human judgments. Query expansion problem can be viewed as selecting top k words which have the maximum accumulated similarity to a given word set. It has been proved as an effective method in BIR and has been studied for over two decades. However, most of the previous researches focus on only one controlled vocabulary: MeSH. In addition, early studies find that applying ontology won\u27t necessarily improve searching performance. In this dissertation, we propose a novel graph based query expansion approach which is able to take advantage of the global information from multiple controlled vocabularies via building a biomedical ontology graph from selected vocabularies in Metathesaurus. We apply Personalized PageRank algorithm on the ontology graph to rank and identify top terms which are highly relevant to the original user query, yet not presented in that query. Those new terms are reordered by a weighted scheme to prioritize specialized concepts. We multiply a scaling factor to those final selected terms to prevent query drifting and append them to the original query in the search. Experiments show that our approach achieves 17.7% improvement in 11 points average precision and recall value against Lucene\u27s default indexing and searching strategy and by 24.8% better against all the other strategies on average. Furthermore, we observe that expanding with specialized concepts rather than generalized concepts can substantially improve the recall-precision performance. Furthermore, we have successfully applied WEST from the underlying WordNet graph to biomedical ontology graph constructed by multiple controlled vocabularies in Metathesaurus. Experiments indicate that WEST further improve the recall-precision performance. Finally, we have developed a Graph-based Biomedical Search Engine (G-Bean) for retrieving and visualizing information from literature using our proposed query expansion algorithm. G-Bean accepts any medical related user query and processes them with expanded medical query to search for the MEDLINE database

    Biomedical entities recognition in Spanish combining word embeddings

    Get PDF
    El reconocimiento de entidades con nombre (NER) es una tarea importante en el campo del Procesamiento del Lenguaje Natural que se utiliza para extraer conocimiento significativo de los documentos textuales. El objetivo de NER es identificar trozos de texto que se refieran a entidades específicas. En esta tesis pretendemos abordar la tarea de NER en el dominio biomédico y en español. En este dominio las entidades pueden referirse a nombres de fármacos, síntomas y enfermedades y ofrecen un conocimiento valioso a los expertos sanitarios. Para ello, proponemos un modelo basado en redes neuronales y empleamos una combinación de word embeddings. Además, nosotros generamos unos nuevos embeddings específicos del dominio y del idioma para comprobar su eficacia. Finalmente, demostramos que la combinación de diferentes word embeddings como entrada a la red neuronal mejora los resultados del estado de la cuestión en los escenarios aplicados.Named Entity Recognition (NER) is an important task in the field of Natural Language Processing that is used to extract meaningful knowledge from textual documents. The goal of NER is to identify text fragments that refer to specific entities. In this thesis we aim to address the task of NER in the Spanish biomedical domain. In this domain entities can refer to drug, symptom and disease names and offer valuable knowledge to health experts. For this purpose, we propose a model based on neural networks and employ a combination of word embeddings. In addition, we generate new domain- and language-specific embeddings to test their effectiveness. Finally, we show that the combination of different word embeddings as input to the neural network improves the state-of-the-art results in the applied scenarios.Tesis Univ. Jaén. Departamento de Informática. Leída el 22 abril de 2021

    Deep learning for precision medicine

    Get PDF
    As a result of the recent trend towards digitization, an increasing amount of information is recorded in clinics and hospitals, and this increasingly overwhelms the human decision maker. This issue is one of the main reasons why Machine Learning (ML) is gaining attention in the medical domain, since ML algorithms can make use of all the available information to predict the most likely future events that will occur to each individual patient. Physicians can include these predictions in their decision processes which can lead to improved outcomes. Eventually ML can also be the basis for a decision support system that provides personalized recommendations for each individual patient. It is also worth noticing that medical datasets are becoming both longer (i.e. we have more samples collected through time) and wider (i.e. we store more variables). There- fore we need to use ML algorithms capable of modelling complex relationships among a big number of time-evolving variables. A kind of models that can capture very complex relationships are Deep Neural Networks, which have proven to be successful in other areas of ML, like for example Language Modelling, which is a use case that has some some similarities with the medical use case. However, the medical domain has a set of characteristics that make it an almost unique scenario: multiple events can occur at the same time, there are multiple sequences (i.e. multiple patients), each sequence has an associated set of static variables, both inputs and outputs can be a combination of different data types, etc. For these reasons we need to develop approaches specifically designed for the medical use case. In this work we design and develop different kind of models based on Neural Networks that are suitable for modelling medical datasets. Besides, we tackle different medical tasks and datasets, showing which models work best in each case. The first dataset we use is one collected from patients that suffered from kidney failure. The data was collected in the Charité hospital in Berlin and it is the largest data collection of its kind in Europe. Once the kidney has failed, patients face a lifelong treatment and periodic visits to the clinic for the rest of their lives. Until the hospital finds a new kidney for the patient, he or she must attend to the clinic multiple times per week in order to receive dialysis, which is a treatment that replaces many of the functions of the kidney. After the transplant has been performed, the patient receives immunosuppressive therapy to avoid the rejection of the transplanted kidney. Patients must be periodically controlled to check the status of the kidney, adjust the treatment and take care of associated diseases, such as those that arise due to the immunosuppressive therapy. This dataset started being recorded more than 30 years ago and it is composed of more than 4000 patients that underwent a renal transplantation or are waiting for it. The database has been the basis for many studies in the past. Our first goal with the nephrology dataset is to develop a system to predict the next events that will be recorded in the electronic medical record of each patient, and thus to develop the basis for a future clinical decision support system. Specifically, we model three aspects of the patient evolution: medication prescriptions, laboratory tests ordered and laboratory test results. Besides, there are a set of endpoints that can happen after a transplantation and it would be very valuable for the physicians to be able to know beforehand when one of these is going to happen. Specifically, we also predict whether the patient will die, the transplant will be rejected, or the transplant will be lost. For each visit that a patient makes to the clinic, we anticipate which of those three events (if any) will occur both within 6 months and 12 months after the visit. The second dataset that we use in this thesis is the one collected by the MEmind Wellness Tracker, which contains information related to psychiatric patients. Suicide is the second leading cause of death in the 15-29 years age group, and its prevention is one of the top public health priorities. Traditionally, psychiatric patients have been assessed by self-reports, but these su↵er from recall bias. To improve data quantity and quality, the MEmind Wellness Tracker provides a mobile application that enables patients to send daily reports about their status. Thus, this application enables physicians to get information about patients in their natural environments. Therefore this dataset contains sequential information generated by the MEmind application, sequential information generated during medical visits and static information of each patient. Our goal with this dataset is to predict the suicidal ideation value that each patient will report next. In order to model both datasets, we have developed a set of predictive Machine Learning models based on Neural Networks capable of integrating multiple sequences of data withthe background information of each patient. We compare the performance achieved by these approaches with the ones obtained with classical ML algorithms. For the task of predicting the next events that will be observed in the nephrology dataset, we obtained the best performance with a Feedforward Neural Network containing a representation layer. On the other hand, for the tasks of endpoint prediction in nephrology patients and the task of suicidal ideation prediction, we obtained the best performance with a model that combines a Feedforward Neural Network with one or multiple Recurrent Neural Networks (RNNs) using Gated Recurrent Units. We hypothesize that this kind of models that include RNNs provide the best performance when the dataset contains long-term dependencies. To our knowledge, our work is the first one that develops these kind of deep networks that combine both static and several sources of dynamic information. These models can be useful in many other medical datasets and even in datasets within other domains. We show some examples where our approach is successfully applied to non-medical datasets that also present multiple variables evolving in time. Besides, we installed the endpoints prediction model as a standalone system in the Charit ́e hospital in Berlin. For this purpose, we developed a web based user interface that the physicians can use, and an API interface that can be used to connect our predictive system with other IT systems in the hospital. These systems can be seen as a recommender system, however they do not necessarily generate valid prescriptions. For example, for certain patient, a system can predict very high probabilities for all antibiotics in the dataset. Obviously, this patient should not take all antibiotics, but only one of them. Therefore, we need a human decision maker on top of our recommender system. In order to model this decision process, we used an architecture based on a Generative Adversarial Network (GAN). GANs are systems based on Neural Networks that make better generative models than regular Neural Networks. Thus we trained one GAN that works on top of a regular Neural Network and show how the quality of the prescriptions gets improved. We run this experiment with a synthetic dataset that we created for this purpose. The architectures that we developed, are specially designed for modelling medical data, but they can be also useful in other use cases. We run experiments showing how we train them for modelling the readings of a sensor network and also to train a movie recommendation engine
    corecore