    Modeling and Prediction in Diabetes Physiology

    Diabetes is a group of metabolic diseases characterized by the inability of the organism to autonomously regulate the blood glucose levels. It requires continuing medical care to prevent acute complications and to reduce the risk of long-term complications. Inadequate glucose control is associated with damage, dysfunction and failure of various organs. The management of the disease is non trivial and demanding. With today’s standards of current diabetes care, good glucose regulation needs constant attention and decision-making by the individuals with diabetes. Empowering the patients with a decision support system would, therefore, improve their quality of life without additional burdens nor replacing human expertise. This thesis investigates the use of data-driven techniques to the purpose of glucose metabolism modeling and short-term blood-glucose predictions in Type I Diabetes Mellitus (T1DM). The goal was to use models and predictors in an advisory tool able to produce personalized short-term blood glucose predictions and on-the-spot decision making concerning the most adequate choice of insulin delivery, meal intake and exercise, to help diabetic subjects maintaining glycemia as close to normal as possible. The approaches taken to describe the glucose metabolism were discrete-time and continuous-time models on input-output form and statespace form, while the blood glucose short-term predictors, i.e., up to 120 minutes ahead, used ARX-, ARMAX- and subspace-based prediction

    Predicting diabetes-related hospitalizations based on electronic health records

    OBJECTIVE: To derive a predictive model to identify patients likely to be hospitalized during the following year due to complications attributed to Type II diabetes. METHODS: A variety of supervised machine learning classification methods were tested and a new method that discovers hidden patient clusters in the positive class (hospitalized) was developed while, at the same time, sparse linear support vector machine classifiers were derived to separate positive samples from the negative ones (non-hospitalized). The convergence of the new method was established and theoretical guarantees were proved on how the classifiers it produces generalize to a test set not seen during training. RESULTS: The methods were tested on a large set of patients from the Boston Medical Center - the largest safety net hospital in New England. It is found that our new joint clustering/classification method achieves an accuracy of 89% (measured in terms of area under the ROC Curve) and yields informative clusters which can help interpret the classification results, thus increasing the trust of physicians to the algorithmic output and providing some guidance towards preventive measures. While it is possible to increase accuracy to 92% with other methods, this comes with increased computational cost and lack of interpretability. The analysis shows that even a modest probability of preventive actions being effective (more than 19%) suffices to generate significant hospital care savings. CONCLUSIONS: Predictive models are proposed that can help avert hospitalizations, improve health outcomes and drastically reduce hospital expenditures. The scope for savings is significant as it has been estimated that in the USA alone, about $5.8 billion are spent each year on diabetes-related hospitalizations that could be prevented.Accepted manuscrip

    Linear Modeling and Prediction in Diabetes Physiology

    Diabetes Mellitus is a chronic disease characterized by the inability of the organism to autonomously regulate the blood glucose level due to insulin deficiency or resistance, leading to serious health damages. The therapy is essentially based on insulin injections and depends strongly on patient daily decisions, being mainly based upon empirical experience and rules of thumb. The development of a prediction engine capable of personalized on-the-spot decision making concerning the most adequate choice of insulin delivery, meal intake and exercise would therefore be a valuable initiative towards an improved management of the desease. This thesis presents work on data-driven glucose metabolism modeling and short-term, that is, up to 120 minutes, blood-glucose prediction in Type 1 Diabetes Mellitus (T1DM) subjects. In order to address model-based control for blood glucose regulation, low-order, individualized, data-driven, stable, physiological relevant models were identified from a population of 9 T1DM patients data. Model structures include: autoregressive moving average with exogenous inputs (ARMAX) models and state-space models.ARMAX multi-step-ahead predictors were estimated by means of least-squares estimation; next regularization of the autoregressive coefficients was introduced. ARMAX-based predictors and zero-order hold were computed to allow comparison.Finally, preliminary results on subspace-based multi-step-ahead multivariate predictors is presented

    Multi-step-ahead Multivariate Predictors: A Comparative Analysis

    The focus of this article is to undertake a comparative analysis of multi-step-ahead linear multivariate predictors. The approach considered for the estimation will be based on geometrically reliable linear algebra tools, resorting to subspace identification methods. A crucial issue is quantification of both bias error and variance affecting the estimate of the prediction for increasing values of the look ahead when only a small number of samples is available. No complete theory is available so far, nor sufficient numerical experience. Therefore, the analysis of this paper aims at shading some lights on the topic providing some insights and help to develop some intuitions

    Diabetes Mellitus Glucose Prediction by Linear and Bayesian Ensemble Modeling

    Diabetes Mellitus is a chronic disease of impaired blood glucose control due to degraded or absent bodily-specific insulin production, or utilization. To the affected, this in many cases implies relying on insulin injections and blood glucose measurements, in order to keep the blood glucose level within acceptable limits. Risks of developing short- and long-term complications, due to both too high and too low blood glucose concentrations are severalfold, and, generally, the glucose dynamics are not easy too fully comprehend for the affected individual—resulting in poor glucose control. To reduce the burden this implies to the patient and society, in terms of physiological and monetary costs, different technical solutions, based on closed or semi-closed loop blood glucose control, have been suggested. To this end, this thesis investigates simplified linear and merged models of glucose dynamics for the purpose of short-term prediction, developed within the EU FP7 DIAdvisor project. These models could, e.g., be used, in a decision support system, to alert the user of future low and high glucose levels, and, when implemented in a control framework, to suggest proactive actions. The simplified models were evaluated on 47 patient data records from the first DIAdvisor trial. Qualitatively physiological correct responses were imposed, and model-based prediction, up to two hours ahead, and specifically for low blood glucose detection, was evaluated. The glucose raising, and lowering effect of meals and insulin were estimated, together with the clinically relevant carbohydrate-to-insulin ratio. The model was further expanded to include the blood-to-interstitial lag, and tested for one patient data set. Finally, a novel algorithm for merging of multiple prediction models was developed and validated on both artificial data and 12 datasets from the second DIAdvisor trial

    A comprehensive medical decision–support framework based on a heterogeneous ensemble classifier for diabetes prediction

    Peer reviewe

    Sparse group sufficient dimension reduction and covariance cumulative slicing estimation

    This dissertation contains two main parts: In Part One, for regression problems with grouped covariates, we adopt the idea of sparse group lasso (Friedman et al., 2010) to the framework of the sufficient dimension reduction. We propose a method called the sparse group sufficient dimension reduction (sgSDR) to conduct group and within group variable selections simultaneously without assuming a specific model structure on the regression function. Simulation studies show that our method is comparable to the sparse group lasso under the regular linear model setting, and outperforms sparse group lasso with higher true positive rates and substantially lower false positive rates when the regression function is nonlinear or (and) the error distributions are non-Gaussian. One immediate application of our method is to the gene pathway data analysis where genes naturally fall into groups (pathways). An analysis of a glioblastoma microarray data is included for illustration of our method. In Part Two, for many-valued or continuous Y , the standard practice of replacing the response Y by a discrete version of Y usually results in the loss of power due to the ignorance of intra-slice information. Most of the existing slicing methods highly reply on the selection of the number of slices h. Zhu et al. (2010) proposed a method called the cumulative slicing estimation (CUME) which avoids the otherwise subjective selection of h. In this dissertation, we revisit CUME from a different perspective to gain more insights, and then refine its performance by incorporating the intra-slice covariances. The resulting new method, which we call the covariance cumulative slicing estimation (COCUM), is comparable to CUME when the predictors are normally distributed, and outperforms CUME when the predictors are non-Gaussian, especially in the existence of outliers. The asymptotic results of COCUM are also well proved. --Abstract, page iv

    Development and assessment of linear regression techniques for modeling multisensor data for non-invasive continuos glucose monitoring

    Solianis Monitoring AG (Zurigo, Svizzera) ha recentemente proposto un multisensore non invasivo per il monitoraggio continuo della glicemia, basato su una combinazione di sensori dielettrici e ottici. Lo scopo del progetto di ricerca in collaborazione con Solianis Monitoring AG consiste nello sviluppo e nella valutazione di un modello per la stima della glicemia a partire da dati del multisensore. In questo lavoro di tesi tre differenti metodi per la stima di un modello multivariato di regressione lineare saranno valutati e confrontati: Ordinary Least Squares (OLS), Partial Least Squares (PLS) and Least Absolute Shrinkage and Selection Operator (LASSO). Prima verranno descritti i tre metodi dal punto di vista metodologico e algoritmico. Successivamente, i tre metodi saranno applicati ad un database di 32 esperimenti nei quali misure del multisensore e valori reali della glicemia sono aquisite in parallelo. Infine saranno proposti alcuni metodi per un ulteriore miglioramento delle stime dei profili glicemici.openEmbargo per motivi di segretezza e di proprietà dei risultati e informazioni di enti esterni o aziende private che hanno partecipato alla realizzazione del lavoro di ricerca relativo alla tes

    Simultaneous Modeling of Disease Screening and Severity Prediction: A Multi-task and Sparse Regularization Approach

    Disease prediction is one of the central problems in biostatistical research. Some biomarkers are not only helpful in diagnosing and screening diseases but also associated with the severity of the diseases. It should be helpful to construct a prediction model that can estimate severity at the diagnosis or screening stage from perspectives such as treatment prioritization. We focus on solving the combined tasks of screening and severity prediction, considering a combined response variable such as \{healthy, mild, intermediate, severe\}. This type of response variable is ordinal, but since the two tasks do not necessarily share the same statistical structure, the conventional cumulative logit model (CLM) may not be suitable. To handle the composite ordinal response, we propose the Multi-task Cumulative Logit Model (MtCLM) with structural sparse regularization. This model is sufficiently flexible that can fit the different structures of the two tasks and capture their shared structure of them. In addition, MtCLM is valid as a stochastic model in the entire predictor space, unlike another conventional and flexible model, the non-parallel cumulative logit model (NPCLM). We conduct simulation experiments and real data analysis to illustrate the prediction performance and interpretability

    Non-Invasive Continuous Glucose Monitoring: Identification of Models for Multi-Sensor Systems

    Diabetes is a disease that undermines the normal regulation of glucose levels in the blood. In people with diabetes, the body does not secrete insulin (Type 1 diabetes) or derangements occur in both insulin secretion and action (Type 2 diabetes). In spite of the therapy, which is mainly based on controlled regimens of insulin and drug administration, diet, and physical exercise, tuned according to self-monitoring of blood glucose (SMBG) levels 3-4 times a day, blood glucose concentration often exceeds the normal range thresholds of 70-180 mg/dL. While hyperglycaemia mostly affects long-term complications (such as neuropathy, retinopathy, cardiovascular, and heart diseases), hypoglycaemia can be very dangerous in the short-term and, in the worst-case scenario, may bring the patient into hypoglycaemic coma. New scenarios in diabetes treatment have been opened in the last 15 years, when continuous glucose monitoring (CGM) sensors, able to monitor glucose concentration continuously (i.e. with a reading every 1 to 5 min) over several days, entered clinical research. CGM sensors can be used both retrospectively, e.g., to optimize the metabolic control, and in real-time applications, e.g., in the "smart" CGM sensors, able to generate alerts when glucose concentrations are predicted to exceed the normal range thresholds or in the so-called "artificial pancreas". Most CGM sensors exploit needles and are thus invasive, although minimally. In order to improve patients comfort, Non-Invasive Continuous Glucose Monitoring (NI-CGM) technologies have been widely investigated in the last years and their ability to monitor glucose changes in the human body has been demonstrated under highly controlled (e.g. in-clinic) conditions. As soon as these conditions become less favourable (e.g. in daily-life use) several problems have been experienced that can be associated with physiological and environmental perturbations. To tackle this issue, the multisensor concept received greater attention in the last few years. A multisensor consists in the embedding of sensors of different nature within the same device, allowing the measurement of endogenous (glucose, skin perfusion, sweating, movement, etc.) as well as exogenous (temperature, humidity, etc.) factors. The main glucose related signals and those measuring specific detrimental processes have to be combined through a suitable mathematical model with the final goal of estimating glucose non-invasively. White-box models, where differential equations are used to describe the internal behavior of the system, can be rarely considered to combine multisensor measurements because a physical/mechanistic model linking multisensor data to glucose is not easily available. A more viable approach considers black-box models, which do not describe the internal mechanisms of the system under study, but rather depict how the inputs (channels from the non-invasive device) determine the output (estimated glucose values) through a transfer function (which we restrict to the class of multivariate linear models). Unfortunately, numerical problems usually arise in the identication of model parameters, since the multisensor channels are highly correlated (especially for spectroscopy based devices) and for the potentially high dimension of the measurement space. The aim of the thesis is to investigate and evaluate different techniques usable for the identication of the multivariate linear regression models parameters linking multisensor data and glucose. In particular, the following methods are considered: Ordinary Least Squares (OLS); Partial Least Squares (PLS); the Least Absolute Shrinkage and Selection Operator (LASSO) based on l1 norm regularization; Ridge regression based on l2 norm regularization; Elastic Net (EN), based on the combination of the two previous norms. As a case study, we consider data from the Multisensor device mainly based on dielectric and optical sensors developed by Solianis Monitoring AG (Zurich, Switzerland) which partially sponsored the PhD scholarship. Solianis Monitoring AG IP portfolio is now held by Biovotion AG (Zurich, Switzerland). Forty-five recording sessions provided by Solianis Monitoring AG and collected in 6 diabetic human beings undertaken hypo and hyperglycaemic protocols performed at the University Hospital Zurich are considered. The models identified with the aforementioned techniques using a data subset are then assessed against an independent test data subset. Results show that methods controlling complexity outperform OLS during model test. In general, regularization techniques outperform PLS, especially those embedding the l1 norm (LASSO end EN), because they set many channel weights to zero thus resulting more robust to occasional spikes occurring in the Multisensor channels. In particular, the EN model results the best one, sharing both the properties of sparseness and the grouping effect induced by the l1 and l2 norms respectively. In general, results indicate that, although the performance, in terms of overall accuracy, is not yet comparable with that of SMBG enzyme-based needle sensors, the Multisensor platform combined with the Elastic-Net (EN) models is a valid tool for the real-time monitoring of glycaemic trends. An effective application concerns the complement of sparse SMBG measures with glucose trend information within the recently developed concept of dynamic risk for the correct judgment of dangerous events such as hypoglycaemia. The body of the thesis is organized into three main parts: Part I (including Chapters 1 to 4), first gives an introduction of the diabetes disease and of the current technologies for NI-CGM (including the Multisensor device by Solianis) and then states the aims of the thesis; Part II (which includes Chapters 5 to 9), first describes some of the issues to be faced in high dimensional regression problems, and then presents OLS, PLS, LASSO, Ridge and EN using a tutorial example to highlight their advantages and drawbacks; Finally, Part III (including Chapters 10-12), presents the case study with the data set and results. Some concluding remarks and possible future developments end the thesis. In particular, a Monte Carlo procedure to evaluate robustness of the calibration procedure for the Solianis Multisensor device is proposed, together with a new cost function to be used for identifying models