6,036 research outputs found

    Time Series Modeling of Irregularly Sampled Multivariate Clinical Data

    Get PDF
    Building of an accurate predictive model of clinical time series for a patient is critical for understanding of the patient condition, its dynamics, and optimal patient management. Unfortunately, this process is challenging because of: (1) multivariate behaviors: the real-world dynamics is multivariate and it is better described by multivariate time series (MTS); (2) irregular samples: sequential observations are collected at different times, and the time elapsed between two consecutive observations may vary; and (3) patient variability: clinical MTS vary from patient to patient and an individual patient may exhibit short-term variability reflecting the different events affecting the care and patient state. In this dissertation, we investigate the different ways of developing and refining forecasting models from the irregularly sampled clinical MTS data collection. First, we focus on the refinements of a popular model for MTS analysis: the linear dynamical system (LDS) (a.k.a Kalman filter) and its application to MTS forecasting. We propose (1) a regularized LDS learning framework which automatically shuts down LDSs' spurious and unnecessary dimensions, and consequently, prevents the overfitting problem given a small amount of data; and (2) a generalized LDS learning framework via matrix factorization, which allows various constraints can be easily incorporated to guide the learning process. Second, we study ways of modeling irregularly sampled univariate clinical time series. We develop a new two-layer hierarchical dynamical system model for irregularly sampled clinical time series prediction. We demonstrate that our new system adapts better to irregular samples and it supports more accurate predictions. Finally, we propose, develop and experiment with two personalized forecasting frameworks for modeling and predicting clinical MTS of an individual patient. The first approach relies on model adaptation techniques. It calibrates the population based model's predictions with patient specific residual models, which are learned from the difference between the patient observations and the population based model's predictions. The second framework relies on adaptive model selection strategies to combine advantages of the population based, patient specific and short-term individualized predictive models. We demonstrate the benefits and advantages of the aforementioned frameworks on synthetic data sets, public time series data sets and clinical data extracted from EHRs

    A model-based multithreshold method for subgroup identification

    Get PDF
    Thresholding variable plays a crucial role in subgroup identification for personalizedmedicine. Most existing partitioning methods split the sample basedon one predictor variable. In this paper, we consider setting the splitting rulefrom a combination of multivariate predictors, such as the latent factors, principlecomponents, and weighted sum of predictors. Such a subgrouping methodmay lead to more meaningful partitioning of the population than using a singlevariable. In addition, our method is based on a change point regression modeland thus yields straight forward model-based prediction results. After choosinga particular thresholding variable form, we apply a two-stage multiple changepoint detection method to determine the subgroups and estimate the regressionparameters. We show that our approach can produce two or more subgroupsfrom the multiple change points and identify the true grouping with high probability.In addition, our estimation results enjoy oracle properties. We design asimulation study to compare performances of our proposed and existing methodsand apply them to analyze data sets from a Scleroderma trial and a breastcancer study

    Personalized Event Prediction for Electronic Health Records

    Full text link
    Clinical event sequences consist of hundreds of clinical events that represent records of patient care in time. Developing accurate predictive models of such sequences is of a great importance for supporting a variety of models for interpreting/classifying the current patient condition, or predicting adverse clinical events and outcomes, all aimed to improve patient care. One important challenge of learning predictive models of clinical sequences is their patient-specific variability. Based on underlying clinical conditions, each patient's sequence may consist of different sets of clinical events (observations, lab results, medications, procedures). Hence, simple population-wide models learned from event sequences for many different patients may not accurately predict patient-specific dynamics of event sequences and their differences. To address the problem, we propose and investigate multiple new event sequence prediction models and methods that let us better adjust the prediction for individual patients and their specific conditions. The methods developed in this work pursue refinement of population-wide models to subpopulations, self-adaptation, and a meta-level model switching that is able to adaptively select the model with the best chance to support the immediate prediction. We analyze and test the performance of these models on clinical event sequences of patients in MIMIC-III database.Comment: arXiv admin note: text overlap with arXiv:2104.0178

    Statistical Methods for Establishing Personalized Treatment Rules in Oncology

    Get PDF

    Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

    Full text link
    This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625

    EHRs Connect Research and Practice: Where Predictive Modeling, Artificial Intelligence, and Clinical Decision Support Intersect

    Get PDF
    Objectives: Electronic health records (EHRs) are only a first step in capturing and utilizing health-related data - the challenge is turning that data into useful information. Furthermore, EHRs are increasingly likely to include data relating to patient outcomes, functionality such as clinical decision support, and genetic information as well, and, as such, can be seen as repositories of increasingly valuable information about patients' health conditions and responses to treatment over time. Methods: We describe a case study of 423 patients treated by Centerstone within Tennessee and Indiana in which we utilized electronic health record data to generate predictive algorithms of individual patient treatment response. Multiple models were constructed using predictor variables derived from clinical, financial and geographic data. Results: For the 423 patients, 101 deteriorated, 223 improved and in 99 there was no change in clinical condition. Based on modeling of various clinical indicators at baseline, the highest accuracy in predicting individual patient response ranged from 70-72% within the models tested. In terms of individual predictors, the Centerstone Assessment of Recovery Level - Adult (CARLA) baseline score was most significant in predicting outcome over time (odds ratio 4.1 + 2.27). Other variables with consistently significant impact on outcome included payer, diagnostic category, location and provision of case management services. Conclusions: This approach represents a promising avenue toward reducing the current gap between research and practice across healthcare, developing data-driven clinical decision support based on real-world populations, and serving as a component of embedded clinical artificial intelligences that "learn" over time.Comment: Keywords: Data Mining; Decision Support Systems, Clinical; Electronic Health Records; Implementation; Evidence-Based Medicine; Data Warehouse; (2012). EHRs Connect Research and Practice: Where Predictive Modeling, Artificial Intelligence, and Clinical Decision Support Intersect. Health Policy and Technology. arXiv admin note: substantial text overlap with arXiv:1112.166

    Multimodal Data Fusion and Quantitative Analysis for Medical Applications

    Get PDF
    Medical big data is not only enormous in its size, but also heterogeneous and complex in its data structure, which makes conventional systems or algorithms difficult to process. These heterogeneous medical data include imaging data (e.g., Positron Emission Tomography (PET), Computerized Tomography (CT), Magnetic Resonance Imaging (MRI)), and non-imaging data (e.g., laboratory biomarkers, electronic medical records, and hand-written doctor notes). Multimodal data fusion is an emerging vital field to address this urgent challenge, aiming to process and analyze the complex, diverse and heterogeneous multimodal data. The fusion algorithms bring great potential in medical data analysis, by 1) taking advantage of complementary information from different sources (such as functional-structural complementarity of PET/CT images) and 2) exploiting consensus information that reflects the intrinsic essence (such as the genetic essence underlying medical imaging and clinical symptoms). Thus, multimodal data fusion benefits a wide range of quantitative medical applications, including personalized patient care, more optimal medical operation plan, and preventive public health. Though there has been extensive research on computational approaches for multimodal fusion, there are three major challenges of multimodal data fusion in quantitative medical applications, which are summarized as feature-level fusion, information-level fusion and knowledge-level fusion: • Feature-level fusion. The first challenge is to mine multimodal biomarkers from high-dimensional small-sample multimodal medical datasets, which hinders the effective discovery of informative multimodal biomarkers. Specifically, efficient dimension reduction algorithms are required to alleviate "curse of dimensionality" problem and address the criteria for discovering interpretable, relevant, non-redundant and generalizable multimodal biomarkers. • Information-level fusion. The second challenge is to exploit and interpret inter-modal and intra-modal information for precise clinical decisions. Although radiomics and multi-branch deep learning have been used for implicit information fusion guided with supervision of the labels, there is a lack of methods to explicitly explore inter-modal relationships in medical applications. Unsupervised multimodal learning is able to mine inter-modal relationship as well as reduce the usage of labor-intensive data and explore potential undiscovered biomarkers; however, mining discriminative information without label supervision is an upcoming challenge. Furthermore, the interpretation of complex non-linear cross-modal associations, especially in deep multimodal learning, is another critical challenge in information-level fusion, which hinders the exploration of multimodal interaction in disease mechanism. • Knowledge-level fusion. The third challenge is quantitative knowledge distillation from multi-focus regions on medical imaging. Although characterizing imaging features from single lesions using either feature engineering or deep learning methods have been investigated in recent years, both methods neglect the importance of inter-region spatial relationships. Thus, a topological profiling tool for multi-focus regions is in high demand, which is yet missing in current feature engineering and deep learning methods. Furthermore, incorporating domain knowledge with distilled knowledge from multi-focus regions is another challenge in knowledge-level fusion. To address the three challenges in multimodal data fusion, this thesis provides a multi-level fusion framework for multimodal biomarker mining, multimodal deep learning, and knowledge distillation from multi-focus regions. Specifically, our major contributions in this thesis include: • To address the challenges in feature-level fusion, we propose an Integrative Multimodal Biomarker Mining framework to select interpretable, relevant, non-redundant and generalizable multimodal biomarkers from high-dimensional small-sample imaging and non-imaging data for diagnostic and prognostic applications. The feature selection criteria including representativeness, robustness, discriminability, and non-redundancy are exploited by consensus clustering, Wilcoxon filter, sequential forward selection, and correlation analysis, respectively. SHapley Additive exPlanations (SHAP) method and nomogram are employed to further enhance feature interpretability in machine learning models. • To address the challenges in information-level fusion, we propose an Interpretable Deep Correlational Fusion framework, based on canonical correlation analysis (CCA) for 1) cohesive multimodal fusion of medical imaging and non-imaging data, and 2) interpretation of complex non-linear cross-modal associations. Specifically, two novel loss functions are proposed to optimize the discovery of informative multimodal representations in both supervised and unsupervised deep learning, by jointly learning inter-modal consensus and intra-modal discriminative information. An interpretation module is proposed to decipher the complex non-linear cross-modal association by leveraging interpretation methods in both deep learning and multimodal consensus learning. • To address the challenges in knowledge-level fusion, we proposed a Dynamic Topological Analysis framework, based on persistent homology, for knowledge distillation from inter-connected multi-focus regions in medical imaging and incorporation of domain knowledge. Different from conventional feature engineering and deep learning, our DTA framework is able to explicitly quantify inter-region topological relationships, including global-level geometric structure and community-level clusters. K-simplex Community Graph is proposed to construct the dynamic community graph for representing community-level multi-scale graph structure. The constructed dynamic graph is subsequently tracked with a novel Decomposed Persistence algorithm. Domain knowledge is incorporated into the Adaptive Community Profile, summarizing the tracked multi-scale community topology with additional customizable clinically important factors
    • …
    corecore