289 research outputs found
Parathyroid hormone is a plausible mediator for the metabolic syndrome in the morbidly obese: a cross-sectional study
<p>Abstract</p> <p>Background</p> <p>The biological mechanisms in the association between the metabolic syndrome (MS) and various biomarkers, such as 25-hydroxyvitamin D (vit D) and magnesium, are not fully understood. Several of the proposed predictors of MS are also possible predictors of parathyroid hormone (PTH). We aimed to explore whether PTH is a possible mediator between MS and various possible explanatory variables in morbidly obese patients.</p> <p>Methods</p> <p>Fasting serum levels of PTH, vit D and magnesium were assessed in a cross-sectional study of 1,017 consecutive morbidly obese patients (68% women). Dependencies between MS and a total of seven possible explanatory variables as suggested in the literature, including PTH, vit D and magnesium, were specified in a path diagram, including both direct and indirect effects. Possible gender differences were also included. Effects were estimated using Bayesian path analysis, a multivariable regression technique, and expressed using standardized regression coefficients.</p> <p>Results</p> <p>Sixty-eight percent of the patients had MS. In addition to type 2 diabetes and age, both PTH and serum phosphate had significant direct effects on MS; 0.36 (95% Credibility Interval (CrI) [0.15, 0.57]) and 0.28 (95% CrI [0.10,0.47]), respectively. However, due to significant gender differences, an increase in either PTH or phosphate corresponded to an increased OR for MS in women only. All proposed predictors of MS had significant direct effects on PTH, with vit D and phosphate the strongest; -0.27 (95% CrI [-0.33,-0.21]) and -0.26 (95% CrI [-0.32,-0.20]), respectively. Though neither vit D nor magnesium had significant direct effects on MS, for women they both affected MS indirectly, due to the strong direct effect of PTH on MS. For phosphate, the indirect effect on MS, mediated through serum calcium and PTH, had opposite sign than the direct effect, resulting in the total effect on MS being somewhat attenuated compared to the direct effect only.</p> <p>Conclusion</p> <p>Our results indicate that for women PTH is a plausible mediator in the association between MS and a range of explanatory variables, including vit D, magnesium and phosphate.</p
Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests
Many decisions in medicine involve trade-offs, such as between diagnosing patients with disease versus unnecessary additional testing for those who are healthy. Net benefit is an increasingly reported decision analytic measure that puts benefits and harms on the same scale. This is achieved by specifying an exchange rate, a clinical judgment of the relative value of benefits (such as detecting a cancer) and harms (such as unnecessary biopsy) associated with models, markers, and tests. The exchange rate can be derived by asking simple questions, such as the maximum number of patients a doctor would recommend for biopsy to find one cancer. As the answers to these sorts of questions are subjective, it is possible to plot net benefit for a range of reasonable exchange rates in a "decision curve." For clinical prediction models, the exchange rate is related to the probability threshold to determine whether a patient is classified as being positive or negative for a disease. Net benefit is useful for determining whether basing clinical decisions on a model, marker, or test would do more good than harm. This is in contrast to traditional measures such as sensitivity, specificity, or area under the curve, which are statistical abstractions not directly informative about clinical value. Recent years have seen an increase in practical applications of net benefit analysis to research data. This is a welcome development, since decision analytic techniques are of particular value when the purpose of a model, marker, or test is to help doctors make better clinical decisions
Understanding overfitting in random forest for probability estimation: a visualization and simulation study
Background: Random forests have become popular for clinical risk prediction modeling. In a case study on predicting ovarian malignancy, we observed training AUCs close to 1. Although this suggests overfitting, performance was competitive on test data. We aimed to understand the behavior of random forests for probability estimation by (1) visualizing data space in three real-world case studies and (2) a simulation study. Methods: For the case studies, multinomial risk estimates were visualized using heatmaps in a 2-dimensional subspace. The simulation study included 48 logistic data-generating mechanisms (DGM), varying the predictor distribution, the number of predictors, the correlation between predictors, the true AUC, and the strength of true predictors. For each DGM, 1000 training datasets of size 200 or 4000 with binary outcomes were simulated, and random forest models were trained with minimum node size 2 or 20 using the ranger R package, resulting in 192 scenarios in total. Model performance was evaluated on large test datasets (N = 100,000). Results: The visualizations suggested that the model learned “spikes of probability” around events in the training set. A cluster of events created a bigger peak or plateau (signal), isolated events local peaks (noise). In the simulation study, median training AUCs were between 0.97 and 1 unless there were 4 binary predictors or 16 binary predictors with a minimum node size of 20. The median discrimination loss, i.e., the difference between the median test AUC and the true AUC, was 0.025 (range 0.00 to 0.13). Median training AUCs had Spearman correlations of around 0.70 with discrimination loss. Median test AUCs were higher with higher events per variable, higher minimum node size, and binary predictors. Median training calibration slopes were always above 1 and were not correlated with median test slopes across scenarios (Spearman correlation − 0.11). Median test slopes were higher with higher true AUC, higher minimum node size, and higher sample size. Conclusions: Random forests learn local probability peaks that often yield near perfect training AUCs without strongly affecting AUCs on test data. When the aim is probability estimation, the simulation results go against the common recommendation to use fully grown trees in random forest models
Impact of predictor measurement heterogeneity across settings on performance of prediction models: a measurement error perspective
It is widely acknowledged that the predictive performance of clinical
prediction models should be studied in patients that were not part of the data
in which the model was derived. Out-of-sample performance can be hampered when
predictors are measured differently at derivation and external validation. This
may occur, for instance, when predictors are measured using different
measurement protocols or when tests are produced by different manufacturers.
Although such heterogeneity in predictor measurement between deriviation and
validation data is common, the impact on the out-of-sample performance is not
well studied. Using analytical and simulation approaches, we examined
out-of-sample performance of prediction models under various scenarios of
heterogeneous predictor measurement. These scenarios were defined and clarified
using an established taxonomy of measurement error models. The results of our
simulations indicate that predictor measurement heterogeneity can induce
miscalibration of prediction and affects discrimination and overall predictive
accuracy, to extents that the prediction model may no longer be considered
clinically useful. The measurement error taxonomy was found to be helpful in
identifying and predicting effects of heterogeneous predictor measurements
between settings of prediction model derivation and validation. Our work
indicates that homogeneity of measurement strategies across settings is of
paramount importance in prediction research.Comment: 32 pages, 4 figure
Three myths about risk thresholds for prediction models
Acknowledgments This work was developed as part of the international initiative of strengthening analytical thinking for observational studies (STRATOS). The objective of STRATOS is to provide accessible and accurate guidance in the design and analysis of observational studies (http://stratos-initiative.org/). Members of the STRATOS Topic Group ‘Evaluating diagnostic tests and prediction models’ are Gary Collins, Carl Moons, Ewout Steyerberg, Patrick Bossuyt, Petra Macaskill, David McLernon, Ben van Calster, and Andrew Vickers. Funding The study is supported by the Research Foundation-Flanders (FWO) project G0B4716N and Internal Funds KU Leuven (project C24/15/037). Laure Wynants is a post-doctoral fellow of the Research Foundation – Flanders (FWO). The funding bodies had no role in the design of the study, collection, analysis, interpretation of data, nor in writing the manuscript. Contributions LW and BVC conceived the original idea of the manuscript, to which ES, MVS and DML then contributed. DT acquired the data. LW analyzed the data, interpreted the results and wrote the first draft. All authors revised the work, approved the submitted version, and are accountable for the integrity and accuracy of the work.Peer reviewedPublisher PD
The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression
OBJECTIVE: Methods to correct class imbalance (imbalance between the frequency of outcome events and nonevents) are receiving increasing interest for developing prediction models. We examined the effect of imbalance correction on the performance of logistic regression models. MATERIAL AND METHODS: Prediction models were developed using standard and penalized (ridge) logistic regression under 4 methods to address class imbalance: no correction, random undersampling, random oversampling, and SMOTE. Model performance was evaluated in terms of discrimination, calibration, and classification. Using Monte Carlo simulations, we studied the impact of training set size, number of predictors, and the outcome event fraction. A case study on prediction modeling for ovarian cancer diagnosis is presented. RESULTS: The use of random undersampling, random oversampling, or SMOTE yielded poorly calibrated models: the probability to belong to the minority class was strongly overestimated. These methods did not result in higher areas under the ROC curve when compared with models developed without correction for class imbalance. Although imbalance correction improved the balance between sensitivity and specificity, similar results were obtained by shifting the probability threshold instead. DISCUSSION: Imbalance correction led to models with strong miscalibration without better ability to distinguish between patients with and without the outcome event. The inaccurate probability estimates reduce the clinical utility of the model, because decisions about treatment are ill-informed. CONCLUSION: Outcome imbalance is not a problem in itself, imbalance correction may even worsen model performance
Minimum sample size for developing a multivariable prediction model using multinomial logistic regression
Aims
Multinomial logistic regression models allow one to predict the risk of a categorical outcome with > 2 categories. When developing such a model, researchers should ensure the number of participants (n)) is appropriate relative to the number of events (Ek)) and the number of predictor parameters (pk) for each category k. We propose three criteria to determine the minimum n required in light of existing criteria developed for binary outcomes.
Proposed criteria
The first criterion aims to minimise the model overfitting. The second aims to minimise the difference between the observed and adjusted R2 Nagelkerke. The third criterion aims to ensure the overall risk is estimated precisely. For criterion (i), we show the sample size must be based on the anticipated Cox-snell R2 of distinct ‘one-to-one’ logistic regression models corresponding to the sub-models of the multinomial logistic regression, rather than on the overall Cox-snell R2 of the multinomial logistic regression.
Evaluation of criteria
We tested the performance of the proposed criteria (i) through a simulation study and found that it resulted in the desired level of overfitting. Criterion (ii) and (iii) were natural extensions from previously proposed criteria for binary outcomes and did not require evaluation through simulation.
Summary
We illustrated how to implement the sample size criteria through a worked example considering the development of a multinomial risk prediction model for tumour type when presented with an ovarian mass. Code is provided for the simulation and worked example. We will embed our proposed criteria within the pmsampsize R library and Stata modules
Risk prediction models for discrete ordinal outcomes: Calibration and the impact of the proportional odds assumption
Calibration is a vital aspect of the performance of risk prediction models, but research in the context of ordinal outcomes is scarce. This study compared calibration measures for risk models predicting a discrete ordinal outcome, and investigated the impact of the proportional odds assumption on calibration and overfitting. We studied the multinomial, cumulative, adjacent category, continuation ratio, and stereotype logit/logistic models. To assess calibration, we investigated calibration intercepts and slopes, calibration plots, and the estimated calibration index. Using large sample simulations, we studied the performance of models for risk estimation under various conditions, assuming that the true model has either a multinomial logistic form or a cumulative logit proportional odds form. Small sample simulations were used to compare the tendency for overfitting between models. As a case study, we developed models to diagnose the degree of coronary artery disease (five categories) in symptomatic patients. When the true model was multinomial logistic, proportional odds models often yielded poor risk estimates, with calibration slopes deviating considerably from unity even on large model development datasets. The stereotype logistic model improved the calibration slope, but still provided biased risk estimates for individual patients. When the true model had a cumulative logit proportional odds form, multinomial logistic regression provided biased risk estimates, although these biases were modest. Nonproportional odds models require more parameters to be estimated from the data, and hence suffered more from overfitting. Despite larger sample size requirements, we generally recommend multinomial logistic regression for risk prediction modeling of discrete ordinal outcomes
- …