179 research outputs found

    Enabling Privacy-Preserving Prediction for Length of Stay in ICU - A Multimodal Federated-Learning-based Approach

    Get PDF
    While the proliferation of data-driven machine learning approaches has resulted in new opportunities for precision healthcare, there are a number of challenges associated with fully utilizing medical data, for example partly due to the heterogeneity of data modalities in electronic health records. Moreover, medical data often sits in data silos due to various regulatory, privacy, ethical, and legal considerations, which complicates efforts to fully utilize machine learning. Motivated by these challenges, we focus on clinical care—length of stay prediction and propose a Multimodal Federated Learning approach. The latter is designed to leverage both privacy-preserving federated learning and multimodal data to facilitate length of stay prediction. By applying this approach to a real-world medical dataset, we demonstrate the predictive power of our approach as well as how it can address the earlier discussed challenges. The findings also suggest the potential of the proposed multimodal federated learning approach for other similar healthcare settings

    Temporal-spatial Correlation Attention Network for Clinical Data Analysis in Intensive Care Unit

    Full text link
    In recent years, medical information technology has made it possible for electronic health record (EHR) to store fairly complete clinical data. This has brought health care into the era of "big data". However, medical data are often sparse and strongly correlated, which means that medical problems cannot be solved effectively. With the rapid development of deep learning in recent years, it has provided opportunities for the use of big data in healthcare. In this paper, we propose a temporal-saptial correlation attention network (TSCAN) to handle some clinical characteristic prediction problems, such as predicting death, predicting length of stay, detecting physiologic decline, and classifying phenotypes. Based on the design of the attention mechanism model, our approach can effectively remove irrelevant items in clinical data and irrelevant nodes in time according to different tasks, so as to obtain more accurate prediction results. Our method can also find key clinical indicators of important outcomes that can be used to improve treatment options. Our experiments use information from the Medical Information Mart for Intensive Care (MIMIC-IV) database, which is open to the public. Finally, we have achieved significant performance benefits of 2.0\% (metric) compared to other SOTA prediction methods. We achieved a staggering 90.7\% on mortality rate, 45.1\% on length of stay. The source code can be find: \url{https://github.com/yuyuheintju/TSCAN}

    An explainable machine learning framework for lung cancer hospital length of stay prediction

    Get PDF
    This work introduces a predictive Length of Stay (LOS) framework for lung cancer patients using machine learning (ML) models. The framework proposed to deal with imbalanced datasets for classification-based approaches using electronic healthcare records (EHR). We have utilized supervised ML methods to predict lung cancer inpatients LOS during ICU hospitalization using the MIMIC-III dataset. Random Forest (RF) Model outperformed other models and achieved predicted results during the three framework phases. With clinical significance features selection, over-sampling methods (SMOTE and ADASYN) achieved the highest AUC results (98% with CI 95%: 95.3–100%, and 100% respectively). The combination of Over-sampling and under-sampling achieved the second-highest AUC results (98%, with CI 95%: 95.3–100%, and 97%, CI 95%: 93.7–100% SMOTE-Tomek, and SMOTE-ENN respectively). Under-sampling methods reported the least important AUC results (50%, with CI 95%: 40.2–59.8%) for both (ENN and Tomek- Links). Using ML explainable technique called SHAP, we explained the outcome of the predictive model (RF) with SMOTE class balancing technique to understand the most significant clinical features that contributed to predicting lung cancer LOS with the RF model. Our promising framework allows us to employ ML techniques in-hospital clinical information systems to predict lung cancer admissions into ICU

    Length of Stay prediction for Hospital Management using Domain Adaptation

    Full text link
    Inpatient length of stay (LoS) is an important managerial metric which if known in advance can be used to efficiently plan admissions, allocate resources and improve care. Using historical patient data and machine learning techniques, LoS prediction models can be developed. Ethically, these models can not be used for patient discharge in lieu of unit heads but are of utmost necessity for hospital management systems in charge of effective hospital planning. Therefore, the design of the prediction system should be adapted to work in a true hospital setting. In this study, we predict early hospital LoS at the granular level of admission units by applying domain adaptation to leverage information learned from a potential source domain. Time-varying data from 110,079 and 60,492 patient stays to 8 and 9 intensive care units were respectively extracted from eICU-CRD and MIMIC-IV. These were fed into a Long-Short Term Memory and a Fully connected network to train a source domain model, the weights of which were transferred either partially or fully to initiate training in target domains. Shapley Additive exPlanations (SHAP) algorithms were used to study the effect of weight transfer on model explanability. Compared to the benchmark, the proposed weight transfer model showed statistically significant gains in prediction accuracy (between 1% and 5%) as well as computation time (up to 2hrs) for some target domains. The proposed method thus provides an adapted clinical decision support system for hospital management that can ease processes of data access via ethical committee, computation infrastructures and time

    Continuous patient state attention models

    Get PDF
    Irregular time-series (ITS) are prevalent in the electronic health records (EHR) as the data is recorded in EHR system as per the clinical guidelines/requirements but not for research and also depends on the patient health status. ITS present challenges in training of machine learning algorithms, which are mostly built on assumption of coherent fixed dimensional feature space. In this paper, we propose a computationally efficient variant of the transformer based on the idea of cross-attention, called Perceiver, for time-series in healthcare. We further develop continuous patient state attention models, using the Perceiver and the transformer to deal with ITS in EHR. The continuous patient state models utilise neural ordinary differential equations to learn the patient health dynamics, i.e., patient health trajectory from the observed irregular time-steps, which enables them to sample any number of time-steps at any time. The performance of the proposed models is evaluated on in-hospital-mortality prediction task on Physionet-2012 challenge and MIMIC-III datasets. The Perceiver model significantly outperforms the baselines and reduces the computational complexity, as compared with the transformer model, without significant loss of performance. The carefully designed experiments to study irregularity in healthcare also show that the continuous patient state models outperform the baselines. The code is publicly released and verified at https://codeocean.com/capsule/4587224

    Codificação médica ICD-9-CM automatizada de relatórios clínicos de pacientes diabéticos

    Get PDF
    The assignment of ICD-9-CM codes to patient’s clinical reports is a costly and wearing process manually done by medical personnel, estimated to cost about $25 billion per year in the United States. To develop a system that automates this process has been an ambition of researchers but is still an unsolved problem due to the inherent difficulties in processing unstructured clinical text. This problem is here formulated as a multi-label supervised learning one where the independent variable is the report’s text and the dependent the several assigned ICD-9-CM labels. Different variations of two neural network based models, the Bag-of-Tricks and the Convolutional Neural Network (CNN) are investigated. The models are trained on the diabetic patient subset of the freely available MIMIC-III dataset. The results show that a CNN with three parallel convolutional layers achieves F1 scores of 44.51% for five digit codes and 51.73% for three digit, rolled up, codes. Additionally, it is shown that joining several binary classifiers, with the binary relevance method, produces an improvement of almost 7% over its multi-labeling equivalent in a restricted classification task of only the eleven most common labels in the dataset.A atribuição de códigos ICD-9-CM a relatórios clínicos de pacientes é um processo dispendioso e cansativo, realizado por pessoal médico especializado e com um custo estimado de 25 mil milhões de dólares por ano nos Estados Unidos. É uma constante ambição de investigadores desenvolver um sistema que automatize esta atribuição. No entanto, o problema mantém se irresoluto dadas as dificuldades inerentes em processar texto clínico não estruturado. Este problema é aqui formulado como um de aprendizagem supervisionada multi-label em que a variável independente é o texto do relatório e a dependente os vários códigos ICD-9-CM atribuídos. São investigadas diferentes variações de dois modelos baseados em redes neurais, o Bag-of-Tricks e a Rede Neural Convolucional (RNC). Os modelos são treinados no subconjunto de pacientes diabéticos dos dados MIMIC-III. Os resultados mostram que uma RNC com três níveis convolucionais em paralelo obtém avaliações F1 de 44.51% para códigos de cinco dígitos e 51.73% para códigos abreviados de três dígitos. Além disto, é mostrado que a combinação de vários classificadores binários num só, com o método de relevância binária, produz uma melhoria de 7% em relação ao seu equivalente multi-label, num problema de classificação limitado aos onze códigos mais comuns nos dados.Mestrado em Engenharia de Computadores e Telemátic
    • …
    corecore