Location of Repository

The accuracy of prediction is a commonly studied topic in modern statistics. The\ud performance of a predictor is becoming increasingly more important as real-life\ud decisions axe made on the basis of prediction. In this thesis we investigate the\ud prediction accuracy of logistic models from two different approaches.\ud Logistic regression is often used to discriminate between two groups or populations\ud based on a number of covariates. The receiver operating characteristic\ud (ROC) curve is a commonly used tool (especially in medical statistics) to assess\ud the performance Of such a score or test. By using the same data to fit the logistic\ud regression and calculate the ROC curve we overestimate the performance that\ud the score would give if validated on a sample of future cases. This overestimation\ud is studied and we propose a correction for the ROC curve and the area under the\ud curve. The methods axe illustrated through way of two medical examples and a\ud simulation study, and we show that the overestimation can be quite substantial\ud for small sample sizes.\ud The idea of shrinkage pertains to the notion that by including some prior information\ud about the data under study we can improve prediction. Until now,\ud the study of shrinkage has almost exclusively been concentrated on continuous\ud measurements. We propose a methodology to study shrinkage for logistic regression\ud modelling of categorical data with a binary response. Categorical data\ud with a large number of levels is often grouped for modelling purposes, which\ud discards useful information about the data. By using this information we can\ud apply Bayesian methods to update model parameters and show through examples\ud and simulations that in some circumstances the updated estimates are better\ud predictors than the model

Topics:
QA

OAI identifier:
oai:wrap.warwick.ac.uk:4114

Provided by:
Warwick Research Archives Portal Repository

Downloaded from
http://wrap.warwick.ac.uk/4114/1/WRAP_THESIS_Corbett_2001.pdf

- (1989). A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data.
- (1988). A general regression methodology for ROC curve estimation.
- (1996). A generalized linear model approach for analysing receiver operating characteristic curves.
- (1983). A method of comparing the areas under receiver operating characteristic curves derived from the same cases.
- (1998). A new strategy for evaluating the impact of epidemiologic risk factors for cancer with respect to melanoma.
- (1977). An asymptotic equivalence of choice by model by crossvalidation and Akaike's criterion.
- (2000). An improved measure for comparing diagnostic tests.
- (1990). An introduction to generalized linear models.
- (1970). Analysis of Binary Data.
- (1989). Applied Logistic Regression.
- (1978). Basic principles of ROC analysis.
- (1999). Binormal association - marginal models for ROC analysis.
- (1990). Categorical Data Analysis.
- (1978). Choosing between logistic regression and discriminant analysis.
- (1988). Comparing the areas under two or more correlated receiver operating chaxacteristic curves: a non-parametric approach.
- (1997). Construction and Assessment of Classification Rules.
- (1999). Correcting bias due to misclassification in the estimation of logistic regression models. Statistics and Probability Letters,
- (1987). Cross-validation shrinkage of regression predictors.
- (1986). Cross-validation, the jackknife and the bootstrap, excess error estimation in forward logistic regression.
- (1975). Discrete Multivariate Analysis.
- (1983). Estimating the error rate of a prediction rule: improvement of cross-validation.
- (2000). Evaluating the predictive performance of habitat models developed using logistic regression. Ecologic . al Modelling,
- (1983). Generalized Linear Models.
- (1986). How biased is the apparent error rate of a prediction rule?
- I (1994a). Assessing classification rules.
- (1981). Logistic regression diagnostics.
- (2000). Maximum likelihood estimation of misclassification rates of binomial regression.
- (1968). Maximum likelihood estimation of the parameters of signal detection theory -a direct solution.
- (2001). Measuring diagnostic accuracy of statistical prediction rules.
- (1993). Medical statistics, a commonsense approach.
- (1997). Modem Applied Statistics with 35-PL US.
- (1996). Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy and measuring and reducing errors.
- (1996). Nonparametric and semi-parametric estimation of the receiver operating characteristic curve. The Annals of Statistics,
- (1986). On the robustness of shrinkage predictors in regression to differences between past and future data.
- (1999). Overfitting and the Stein factor.
- (1996). Predicting reoffending for discretionary conditional release. Home Office Research and Statistics Directorate Report 150.
- (1990). Predictive value of statistical models.
- (1993). Receiver operating characteristic (ROC) plots: a fundamentalavaluaitiujg-ol in clinical medicene.
- (1984). Regression modelling strategies for improved prognostic prediction.
- (1983). Regression, prediction and shrinkage (with discussion).
- (1989). Regularized Discriminant Analysis.
- (1994). ROC curves for classification trees.
- (1997). Sampling variability of non-paxametric estimates of the areas under the receiver operating characteristic curves: an update.
- (1996). Screening for cutaneous melanoma by self-skin examination.
- (1986). Sensitivity, specificity and predictive values: a graphical approach.
- (2000). Shrinkage and Calibration in Multiple Regression. PhD Thesis.
- (1971). Signal detectability and medical decision making.
- (1966). Signal Detection Theory and Psychophysics.
- (1975). Signal detection theory and ROC analysis.
- (1998). Slopes of a receiver operating characteristic curve and likelihood ratios for a diagnostic test.
- (1973). Some comments on cp.
- (1997). Statistical classification methods in consumer credit scoring: a review.
- (1980). Statistical significance tests for binormal ROC curves.
- (1975). The area above the ordinal dominance graph and the area below the the receiver operating characteristic curve.
- (1991). The axea under the ROC curve and its competitors. Medical Decision Making,
- (1999). The effectiveness of risk scores: The logit rank plot.
- (1975). The efficiency of logistic regression compared to normal discriminant analysis.
- (1982). The jackknife, bootstrap and other resampling plans.
- (1982). The meaning and use of the area under a receiver operating characteristic curve (ROC) curve.
- (1988). The robustness of 'binormal' assumptions used in fitting ROC curves.
- (1986). The use of relative operating characteristic (ROC) curves in test performance evaluation. Archives of pathology and laboratory medicine,
- (1996). The use of the 'binormal' model for parametric ROC analysis of quantitative diagnostic tests.
- (1998). Using smoothed receiver operating characteristic curves to summarize and compare diagnostic systems.
- (1991). Verification bias and the one parameter logistic ROC curve - some claxifications. Medical Decision Making,

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.