8 research outputs found

    Prediction modelling with many correlated and zero-inflated predictors: assessing a nonnegative garrote approach

    Full text link
    Building prediction models from mass-spectrometry data is challenging due to the abundance of correlated features with varying degrees of zero-inflation, leading to a common interest in reducing the features to a concise predictor set with good predictive performance. In this study, we formally established and examined regularized regression approaches, designed to address zero-inflated and correlated predictors. In particular, we describe a novel two-stage regularized regression approach (ridge-garrote) explicitly modelling zero-inflated predictors using two component variables, comprising a ridge estimator in the first stage and subsequently applying a nonnegative garrote estimator in the second stage. We contrasted ridge-garrote with one-stage methods (ridge, lasso) and other two-stage regularized regression approaches (lasso-ridge, ridge-lasso) for zero-inflated predictors. We assessed the predictive performance and predictor selection properties of these methods in a comparative simulation study and a real-data case study to predict kidney function using peptidomic features derived from mass-spectrometry. In the simulation study, the predictive performance of all assessed approaches was comparable, yet the ridge-garrote approach consistently selected more parsimonious models compared to its competitors in most scenarios. While lasso-ridge achieved higher predictive accuracy than its competitors, it exhibited high variability in the number of selected predictors. Ridge-lasso exhibited slightly superior predictive accuracy than ridge-garrote but at the expense of selecting more noise predictors. Overall, ridge emerged as a favourable option when variable selection is not a primary concern, while ridge-garrote demonstrated notable practical utility in selecting a parsimonious set of predictors, with only minimal compromise in predictive accuracy

    Development and Validation of a Prediction Model for Future Estimated Glomerular Filtration Rate in People With Type 2 Diabetes and Chronic Kidney Disease

    Get PDF
    Importance: Type 2 diabetes increases the risk of progressive diabetic kidney disease, but reliable prediction tools that can be used in clinical practice and aid in patients' understanding of disease progression are currently lacking. Objective: To develop and externally validate a model to predict future trajectories in estimated glomerular filtration rate (eGFR) in adults with type 2 diabetes and chronic kidney disease using data from 3 European multinational cohorts. Design, Setting, and Participants: This prognostic study used baseline and follow-up information collected between February 2010 and December 2019 from 3 prospective multinational cohort studies: PROVALID (Prospective Cohort Study in Patients with Type 2 Diabetes Mellitus for Validation of Biomarkers), GCKD (German Chronic Kidney Disease), and DIACORE (Diabetes Cohorte). A total of 4637 adult participants (aged 18-75 years) with type 2 diabetes and mildly to moderately impaired kidney function (baseline eGFR of ≥30 mL/min/1.73 m2) were included. Data were analyzed between June 30, 2021, and January 31, 2023. Main Outcomes and Measures: Thirteen variables readily available from routine clinical care visits (age, sex, body mass index; smoking status; hemoglobin A1c[mmol/mol and percentage]; hemoglobin, and serum cholesterol levels; mean arterial pressure, urinary albumin-creatinine ratio, and intake of glucose-lowering, blood-pressure lowering, or lipid-lowering medication) were selected as predictors. Repeated eGFR measurements at baseline and follow-up visits were used as the outcome. A linear mixed-effects model for repeated eGFR measurements at study entry up to the last recorded follow-up visit (up to 5 years after baseline) was fit and externally validated. Results: Among 4637 adults with type 2 diabetes and chronic kidney disease (mean [SD] age at baseline, 63.5 [9.1] years; 2680 men [57.8%]; all of White race), 3323 participants from the PROVALID and GCKD studies (mean [SD] age at baseline, 63.2 [9.3] years; 1864 men [56.1%]) were included in the model development cohort, and 1314 participants from the DIACORE study (mean [SD] age at baseline, 64.5 [8.3] years; 816 men [62.1%]) were included in the external validation cohort, with a mean (SD) follow-up of 5.0 (0.6) years. Updating the random coefficient estimates with baseline eGFR values yielded improved predictive performance, which was particularly evident in the visual inspection of the calibration curve (calibration slope at 5 years: 1.09; 95% CI, 1.04-1.15). The prediction model had good discrimination in the validation cohort, with the lowest C statistic at 5 years after baseline (0.79; 95% CI, 0.77-0.80). The model also had predictive accuracy, with an R2ranging from 0.70 (95% CI, 0.63-0.76) at year 1 to 0.58 (95% CI, 0.53-0.63) at year 5. Conclusions and Relevance: In this prognostic study, a reliable prediction model was developed and externally validated; the robust model was well calibrated and capable of predicting kidney function decline up to 5 years after baseline. The results and prediction model are publicly available in an accompanying web-based application, which may open the way for improved prediction of individual eGFR trajectories and disease progression.</p

    A prediction model for the decline in renal function in people with type 2 diabetes mellitus: study protocol

    Get PDF
    Background Chronic kidney disease (CKD) is a well-established complication in people with diabetes mellitus. Roughly one quarter of prevalent patients with diabetes exhibit a CKD stage of 3 or higher and the individual course of progression is highly variable. Therefore, there is a clear need to identify patients at high risk for fast progression and the implementation of preventative strategies. Existing prediction models of renal function decline, however, aim to assess the risk by artificially grouped patients prior to model building into risk strata defined by the categorization of the least-squares slope through the longitudinally fluctuating eGFR values, resulting in a loss of predictive precision and accuracy. Methods This study protocol describes the development and validation of a prediction model for the longitudinal progression of renal function decline in Caucasian patients with type 2 diabetes mellitus (DM2). For development and internal-external validation, two prospective multicenter observational studies will be used (PROVALID and GCKD). The estimated glomerular filtration rate (eGFR) obtained at baseline and at all planned follow-up visits will be the longitudinal outcome. Demographics, clinical information and laboratory measurements available at a baseline visit will be used as predictors in addition to random country-specific intercepts to account for the clustered data. A multivariable mixed-effects model including the main effects of the clinical variables and their interactions with time will be fitted. In application, this model can be used to obtain personalized predictions of an eGFR trajectory conditional on baseline eGFR values. The final model will then undergo external validation using a third prospective cohort (DIACORE). The final prediction model will be made publicly available through the implementation of an R shiny web application. Discussion Our proposed state-of-the-art methodology will be developed using multiple multicentre study cohorts of people with DM2 in various CKD stages at baseline, who have received modern therapeutic treatment strategies of diabetic kidney disease in contrast to previous models. Hence, we anticipate that the multivariable prediction model will aid as an additional informative tool to determine the patient-specific progression of renal function and provide a useful guide to early on identify individuals with DM2 at high risk for rapid progression

    A comparison of methods for causal inference with a rare binary outcome

    No full text
    Die kausale Analyse von Beobachtungsdaten wird nicht nur durch die allgemein komplexe Analytik der Daten beeinträchtigt, sondern wird zusätzlich auch durch vielfältige Herausforderungen wie seltene Outcome-Ereignisse innerhalb der Beobachtungen und einer Vielzahl von Confoundern erschwert. Propensity-Score Methoden bieten eine attraktive Alternative zur Bestimmung von kausalen Therapieeffekten, da sie sich zur Kontrolle des Selektionsbios in einer nichtrandomisierten Studie nicht der Modellierung von Outcome und den unabhängigen Variablen widmen, sondern den Zusammenhang von der experimentellen Behandlung und den Kovariaten modelliert. Die Propensity-Score Methodik wird im Verlauf der Arbeit ausführlich beschrieben und die einzelnen notwendigen Schritte der Propensity Score Analyse diskutiert. Propensity Score Matching und die „inverse probability of treatment weighting“ (IPTW)-Schätzung werden sowohl mit konventionellen Regressionsmodellen zur kausalen Analyse von Beobachtungsstudien verglichen als auch mit einer Kombination zwischen der traditionellen Modellierung und dem Propensity-Score Ansatz. Die unterschiedlichen Methoden zur Schätzung der kausalen Inferenz werden anschließend angewandt, um den marginalen Therapieeffekt einer radiologischen Untersuchung mittels Computertomographie von Patienten, welche sich einer Koronararterien-Bypassoperation (CABG) unterziehen werden, auf das Schlaganfallrisiko während und nach der Operation zu bestimmen. Die vorhergehende Simulationsstudie wird wesentliche Begründungen für die methodischen Entscheidungen liefern, die in Bezug auf die darauffolgende Fallstudie getroffen wurden. Insbesondere wird sich die Simulationsstudie auf den Vergleich der Methoden konzentrieren und deren Performance hinsichtlich der akkuraten Bestimmung des Therapieeffektes ermitteln.Causal inference from observational studies faces a wide variety of challenges in particular in studies with rare outcome events and multiple confounders. The shift towards focusing on modelling the relationship between the treatment assignment and the covariates instead of their association with the outcome to adjust for the effects of selection bias is presented by propensity scores. The methodology will be comprehensively elaborated within the realm of this study and further, appropriate steps in conducting propensity score analyses will be discussed. Matching and weighting the observational data on propensity scores will be compared with an approach based on traditional outcome regression models as well as its combination with propensity score weighting. The different causal inference techniques will be applied to estimate the marginal causal treatment effect of a computer tomography scan examination of patients undergoing coronary artery bypass surgery (CABG) on the postoperative stroke risk. The preceding simulation will provide reasons behind the decisions made regarding the analysis of the case study. More specifically, the simulation study will focus on examining how the four reviewed approaches to causal inference compared in estimating treatment effects accurately under various conditions.9

    Regression with Highly Correlated Predictors: Variable Omission Is Not the Solution

    No full text
    Regression models have been in use for decades to explore and quantify the association between a dependent response and several independent variables in environmental sciences, epidemiology and public health. However, researchers often encounter situations in which some independent variables exhibit high bivariate correlation, or may even be collinear. Improper statistical handling of this situation will most certainly generate models of little or no practical use and misleading interpretations. By means of two example studies, we demonstrate how diagnostic tools for collinearity or near-collinearity may fail in guiding the analyst. Instead, the most appropriate way of handling collinearity should be driven by the research question at hand and, in particular, by the distinction between predictive or explanatory aims

    Individual-specific networks for prediction modelling – A scoping review of methods

    No full text
    International audienceBackground: Recent advances in biotechnology enable the acquisition of high-dimensional data on individuals, posing challenges for prediction models which traditionally use covariates such as clinical patient characteristics. Alternative forms of covariate representations for the features derived from these modern data modalities should be considered that can utilize their intrinsic interconnection. The connectivity information between these features can be represented as an individual-specific network defined by a set of nodes and edges, the strength of which can vary from individual to individual. Global or local graph-theoretical features describing the network may constitute potential prognostic biomarkers instead of or in addition to traditional covariates and may replace the often unsuccessful search for individual biomarkers in a high-dimensional predictor space. Methods: We conducted a scoping review to identify, collate and critically appraise the state-of-art in the use of individual-specific networks for prediction modelling in medicine and applied health research, published during 2000-2020 in the electronic databases PubMed, Scopus and Embase. Results: Our scoping review revealed the main application areas namely neurology and pathopsychology, followed by cancer research, cardiology and pathology (N = 148). Network construction was mainly based on Pearson correlation coefficients of repeated measurements, but also alternative approaches (e.g. partial correlation, visibility graphs) were found. For covariates measured only once per individual, network construction was mostly based on quantifying an individual's contribution to the overall group-level structure. Despite the multitude of identified methodological approaches for individual-specific network inference, the number of studies that were intended to enable the prediction of clinical outcomes for future individuals was quite limited, and most of the models served as proof of concept that network characteristics can in principle be useful for prediction. Conclusion: The current body of research clearly demonstrates the value of individual-specific network analysis for prediction modelling, but it has not yet been considered as a general tool outside the current areas of application. More methodological research is still needed on well-founded strategies for network inference, especially on adequate network sparsification and outcome-guided graph-theoretical feature extraction and selection, and on how networks can be exploited efficiently for prediction modelling

    Routine preoperative aortic computed tomography angiography is associated with reduced risk of stroke in coronary artery bypass grafting: a propensity-matched analysis.

    Full text link
    peer reviewedOBJECTIVES: The aim of this study was to determine stroke rates in patients who did or did not undergo routine computed tomography angiography (CTA) aortic imaging before isolated coronary artery bypass grafting (CABG). METHODS: We conducted a retrospective analysis of a prospectively maintained single-centre registry. Between 2009 and 2016, a total of 2320 consecutive patients who underwent isolated CABG at our institution were identified. Propensity score matching was used to create a paired cohort of patients with similar baseline characteristics who did (CTA cohort) or did not (non-CTA cohort) undergo preoperative aortic CTA. The primary end point of the analysis was in-hospital stroke. RESULTS: In 435 propensity score-matched pairs, stroke occurred in 4 patients (0.92%) in the CTA cohort and in 14 patients (3.22%) in the non-CTA cohort (P = 0.017). Routine preoperative aortic CTA was associated with a significantly reduced risk of in-hospital stroke [relative risk 0.29, 95% confidence interval (CI) 0.09-0.86; P = 0.026; absolute risk reduction 2.3%, 95% CI 0.4-4.2; P = 0.017; number needed to treat = 44, 95% CI 24-242]. CONCLUSIONS: A preoperative screening for atheromatous aortic disease using CTA is associated with reduced risk of stroke after CABG. The routine use of preoperative aortic CTA could be applied so that surgical manipulation of the ascending aorta can be selectively reduced or avoided in patients with atheromatous aortic disease

    Different roles of protein biomarkers predicting eGFR trajectories in people with chronic kidney disease and diabetes mellitus : a nationwide retrospective cohort study

    No full text
    BACKGROUND: Chronic kidney disease (CKD) is a common comorbidity in people with diabetes mellitus, and a key risk factor for further life-threatening conditions such as cardiovascular disease. The early prediction of progression of CKD therefore is an important clinical goal, but remains difficult due to the multifaceted nature of the condition. We validated a set of established protein biomarkers for the prediction of trajectories of estimated glomerular filtration rate (eGFR) in people with moderately advanced chronic kidney disease and diabetes mellitus. Our aim was to discern which biomarkers associate with baseline eGFR or are important for the prediction of the future eGFR trajectory.METHODS: We used Bayesian linear mixed models with weakly informative and shrinkage priors for clinical predictors (n = 12) and protein biomarkers (n = 19) to model eGFR trajectories in a retrospective cohort study of people with diabetes mellitus (n = 838) from the nationwide German Chronic Kidney Disease study. We used baseline eGFR to update the models' predictions, thereby assessing the importance of the predictors and improving predictive accuracy computed using repeated cross-validation.RESULTS: The model combining clinical and protein predictors had higher predictive performance than a clinical only model, with an [Formula: see text] of 0.44 (95% credible interval 0.37-0.50) before, and 0.59 (95% credible interval 0.51-0.65) after updating by baseline eGFR, respectively. Only few predictors were sufficient to obtain comparable performance to the main model, with markers such as Tumor Necrosis Factor Receptor 1 and Receptor for Advanced Glycation Endproducts being associated with baseline eGFR, while Kidney Injury Molecule 1 and urine albumin-creatinine-ratio were predictive for future eGFR decline.CONCLUSIONS: Protein biomarkers only modestly improve predictive accuracy compared to clinical predictors alone. The different protein markers serve different roles for the prediction of longitudinal eGFR trajectories potentially reflecting their role in the disease pathway
    corecore