The accuracy of prediction is a commonly studied topic in modern statistics. The\ud performance of a predictor is becoming increasingly more important as real-life\ud decisions axe made on the basis of prediction. In this thesis we investigate the\ud prediction accuracy of logistic models from two different approaches.\ud Logistic regression is often used to discriminate between two groups or populations\ud based on a number of covariates. The receiver operating characteristic\ud (ROC) curve is a commonly used tool (especially in medical statistics) to assess\ud the performance Of such a score or test. By using the same data to fit the logistic\ud regression and calculate the ROC curve we overestimate the performance that\ud the score would give if validated on a sample of future cases. This overestimation\ud is studied and we propose a correction for the ROC curve and the area under the\ud curve. The methods axe illustrated through way of two medical examples and a\ud simulation study, and we show that the overestimation can be quite substantial\ud for small sample sizes.\ud The idea of shrinkage pertains to the notion that by including some prior information\ud about the data under study we can improve prediction. Until now,\ud the study of shrinkage has almost exclusively been concentrated on continuous\ud measurements. We propose a methodology to study shrinkage for logistic regression\ud modelling of categorical data with a binary response. Categorical data\ud with a large number of levels is often grouped for modelling purposes, which\ud discards useful information about the data. By using this information we can\ud apply Bayesian methods to update model parameters and show through examples\ud and simulations that in some circumstances the updated estimates are better\ud predictors than the model
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.