43 research outputs found

    Graph Interpolation via Fast Fused-Gromovization

    Full text link
    Graph data augmentation has proven to be effective in enhancing the generalizability and robustness of graph neural networks (GNNs) for graph-level classifications. However, existing methods mainly focus on augmenting the graph signal space and the graph structure space independently, overlooking their joint interaction. This paper addresses this limitation by formulating the problem as an optimal transport problem that aims to find an optimal strategy for matching nodes between graphs considering the interactions between graph structures and signals. To tackle this problem, we propose a novel graph mixup algorithm dubbed FGWMixup, which leverages the Fused Gromov-Wasserstein (FGW) metric space to identify a "midpoint" of the source graphs. To improve the scalability of our approach, we introduce a relaxed FGW solver that accelerates FGWMixup by enhancing the convergence rate from O(t−1)\mathcal{O}(t^{-1}) to O(t−2)\mathcal{O}(t^{-2}). Extensive experiments conducted on five datasets, utilizing both classic (MPNNs) and advanced (Graphormers) GNN backbones, demonstrate the effectiveness of FGWMixup in improving the generalizability and robustness of GNNs

    AdaCare:Explainable Clinical Health Status Representation Learning via Scale Adaptive Feature Extraction and Recalibration

    Get PDF
    Deep learning-based health status representation learning and clinical prediction have raised much research interest in recent years. Existing models have shown superior performance, but there are still several major issues that have not been fully taken into consideration. First, the historical variation pattern of the biomarker in diverse time scales plays an important role in indicating the health status, but it has not been explicitly extracted by existing works. Second, key factors that strongly indicate the health risk are different among patients. It is still challenging to adaptively make use of the features for patients in diverse conditions. Third, using the prediction model as a black box will limit the reliability in clinical practice. However, none of the existing works can provide satisfying interpretability and meanwhile achieve high prediction performance. In this work, we develop a general health status representation learning model, named AdaCare. It can capture the long and short-term variations of biomarkers as clinical features to depict the health status in multiple time scales. It also models the correlation between clinical features to enhance the ones which strongly indicate the health status and thus can maintain a state-of-the-art performance in terms of prediction accuracy while providing qualitative in- interpretability. We conduct health risk prediction experiment on two real-world datasets. Experiment results indicate that AdaCare outperforms state-of-the-art approaches and provides effective interpretability which is verifiable by clinical experts

    ConCarE:Personalized Clinical Feature Embedding via Capturing the Healthcare Context

    Get PDF
    Predicting the patient’s clinical outcome from the historical electronic medical records (EMR) is a fundamental research problem in medical informatics. Most deep learning-based solutions for EMR analysis concentrate on learning the clinical visit embedding and exploring the relations between vis- its. Although those works have shown superior performances in healthcare prediction, they fail to thoroughly explore the personal characteristics during the clinical visits. Moreover, existing work usually assumes that a more recent record has a larger weight in the prediction, but this assumption is not true for certain clinical features. In this paper, we propose ConCare to handle the irregular EMR data and extract feature interrelationship to perform individualized healthcare prediction. Our solution can embed the feature sequences separately by modeling the time-aware distribution. ConCare further improves the multi-head self-attention via the cross-head decorrelation, so that the inter-dependencies among dynamic features and static baseline information can be diversely captured to form the personal health context. Experimental results on two real-world EMR datasets demonstrate the effectiveness of ConCare. More importantly, ConCare is able to extract medical findings which can be confirmed by human experts and medical literature

    A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care

    Full text link
    The COVID-19 pandemic has posed a heavy burden to the healthcare system worldwide and caused huge social disruption and economic loss. Many deep learning models have been proposed to conduct clinical predictive tasks such as mortality prediction for COVID-19 patients in intensive care units using Electronic Health Record (EHR) data. Despite their initial success in certain clinical applications, there is currently a lack of benchmarking results to achieve a fair comparison so that we can select the optimal model for clinical use. Furthermore, there is a discrepancy between the formulation of traditional prediction tasks and real-world clinical practice in intensive care. To fill these gaps, we propose two clinical prediction tasks, Outcome-specific length-of-stay prediction and Early mortality prediction for COVID-19 patients in intensive care units. The two tasks are adapted from the naive length-of-stay and mortality prediction tasks to accommodate the clinical practice for COVID-19 patients. We propose fair, detailed, open-source data-preprocessing pipelines and evaluate 17 state-of-the-art predictive models on two tasks, including 5 machine learning models, 6 basic deep learning models and 6 deep learning predictive models specifically designed for EHR data. We provide benchmarking results using data from two real-world COVID-19 EHR datasets. One dataset is publicly available without needing any inquiry and another dataset can be accessed on request. We provide fair, reproducible benchmarking results for two tasks. We deploy all experiment results and models on an online platform. We also allow clinicians and researchers to upload their data to the platform and get quick prediction results using our trained models. We hope our efforts can further facilitate deep learning and machine learning research for COVID-19 predictive modeling.Comment: Junyi Gao, Yinghao Zhu and Wenqing Wang contributed equall

    Leveraging Prototype Patient Representations with Feature-Missing-Aware Calibration to Mitigate EHR Data Sparsity

    Full text link
    Electronic Health Record (EHR) data frequently exhibits sparse characteristics, posing challenges for predictive modeling. Current direct imputation such as matrix imputation approaches hinge on referencing analogous rows or columns to complete raw missing data and do not differentiate between imputed and actual values. As a result, models may inadvertently incorporate irrelevant or deceptive information with respect to the prediction objective, thereby compromising the efficacy of downstream performance. While some methods strive to recalibrate or augment EHR embeddings after direct imputation, they often mistakenly prioritize imputed features. This misprioritization can introduce biases or inaccuracies into the model. To tackle these issues, our work resorts to indirect imputation, where we leverage prototype representations from similar patients to obtain a denser embedding. Recognizing the limitation that missing features are typically treated the same as present ones when measuring similar patients, our approach designs a feature confidence learner module. This module is sensitive to the missing feature status, enabling the model to better judge the reliability of each feature. Moreover, we propose a novel patient similarity metric that takes feature confidence into account, ensuring that evaluations are not based merely on potentially inaccurate imputed values. Consequently, our work captures dense prototype patient representations with feature-missing-aware calibration process. Comprehensive experiments demonstrate that designed model surpasses established EHR-focused models with a statistically significant improvement on MIMIC-III and MIMIC-IV datasets in-hospital mortality outcome prediction task. The code is publicly available at \url{https://github.com/yhzhu99/SparseEHR} to assure the reproducibility

    A comprehensive benchmark for COVID-19 predictive modeling using electronic health records in intensive care

    Get PDF
    The COVID-19 pandemic highlighted the need for predictive deep-learning models in health care. However, practical prediction task design, fair comparison, and model selection for clinical applications remain a challenge. To address this, we introduce and evaluate two new prediction tasks?outcome-specific length-of-stay and early-mortality prediction for COVID-19 patients in intensive care?which better reflect clinical realities. We developed evaluation metrics, model adaptation designs, and open-source data preprocessing pipelines for these tasks while also evaluating 18 predictive models, including clinical scoring methods and traditional machine-learning, basic deep-learning, and advanced deep-learning models, tailored for electronic health record (EHR) data. Benchmarking results from two real-world COVID-19 EHR datasets are provided, and all results and trained models have been released on an online platform for use by clinicians and researchers. Our efforts contribute to the advancement of deep-learning and machine-learning research in pandemic predictive modeling

    Mortality Prediction with Adaptive Feature Importance Recalibration for Peritoneal Dialysis Patients

    Get PDF
    The study aims to develop AICare, an interpretable mortality prediction model, using Electronic Medical Records (EMR) from follow-up visits for End-Stage Renal Disease (ESRD) patients. AICare includes a multi-channel feature extraction module and an adaptive feature importance recalibration module. It integrates dynamic records and static features to perform a personalized health context representation learning. The dataset encompasses 13,091 visits and demographic data of 656 peritoneal dialysis (PD) patients spanning 12 years. An additional public dataset of 4,789 visits from 1,363 hemodialysis (HD) patients is also considered. AI Care outperforms traditional deep learning models in mortality prediction while retaining interpretability. It uncovers mortality-feature relationships, variations in feature importance, and provides reference values. An AI-Doctor interaction system is developed for visualizing patients’ health trajectories and risk indicators

    Long-term nitrogen fertilizer management for enhancing use efficiency and sustainable cotton (Gossypium hirsutum L.)

    Get PDF
    Optimal management of nitrogen fertilizer profoundly impacts sustainable development by influencing nitrogen use efficiency (NUE) and seed cotton yield. However, the effect of long-term gradient nitrogen application on the sandy loam soil is unclear. Therefore, we conducted an 8-year field study (2014–2021) using six nitrogen levels: 0 kg/hm2 (N0), 75 kg/hm2 (N1), 150 kg/hm2 (N2), 225 kg/hm2 (N3), 300 kg/hm2 (N4), and 375 kg/hm2 (N5). The experiment showed that 1) Although nitrogen application had insignificantly affected basic soil fertility, the soil total nitrogen (STN) content had decreased by 5.71%–19.67%, 6.67%–16.98%, and 13.64%–21.74% at 0-cm–20-cm, 20-cm–40-cm, and 40-cm–60-cm soil layers, respectively. 2) The reproductive organs of N3 plants showed the highest nitrogen accumulation and dry matter accumulation in both years. Increasing the nitrogen application rate gradually decreased the dry matter allocation ratio to the reproductive organs. 3) The boll number per unit area of N3 was the largest among all treatments in both years. On sandy loam, the most optional nitrogen rate was 190 kg/hm2–270 kg/hm2 for high seed cotton yield with minimal nitrogen loss and reduced soil environment pollution
    corecore