26 research outputs found

    Standardizing Markers to Evaluate and Compare their Performances

    Get PDF
    Introduction: Markers that purport to distinguish subjects with a condition from those without a condition must be evaluated rigorously for their classification accuracy. A single approach to statistically evaluating and comparing markers is not yet established. Methods: We suggest a standardization that uses the marker distribution in unaffected subjects as a reference. For an affected subject with marker value Y, the standardized placement value is the proportion of unaffected subjects with marker values that exceed Y. Results: We apply the standardization to two illustrative datasets. In patients with pancreatic cancer placement values calculated for the CA 19-9 marker are smaller than for the CA-125 marker, indicating that CA19-9 is a better marker. For detecting hearing impairment, the placement values for the test output (the marker) are smaller when the input sound stimulus is of lower intensity. This indicates that the test better distinguishes hearing impaired from unimpaired ears when a lower intensity sound stimulus is used. Explicit connections are drawn between the distribution of standardized marker values and the receiver operating characteristic (ROC) curve, one established statistical technique for evaluating classifiers. Discussion: The standardization is an intuitive procedure for evaluating markers. It facilitates direct and meaningful comparisons between markers. It also provides a new view of ROC analysis that may render it more accessible to those as yet unfamiliar with it. The general approach provides a mechanism to statistically address important questions that are typically not addressed in current marker research, such as quantifying and controlling for covariate effects

    Estimation and Comparison of Receiver Operating Characteristic Curves

    Get PDF
    The receiver operating characteristic (ROC) curve displays the capacity of a marker or diagnostic test to discriminate between two groups of subjects, cases versus controls. We present a comprehensive suite of Stata commands for performing ROC analysis. Non-parametric, semiparametric and parametric estimators are calculated. Comparisons between curves are based on the area or partial area under the ROC curve. Alternatively pointwise comparisons between ROC curves or inverse ROC curves can be made. Options to adjust these analyses for covariates, and to perform ROC regression are described in a companion article. We use a unified framework by representing the ROC curve as the distribution of the marker in cases after standardizing it to the control reference distribution

    Accommodating Covariates in ROC Analysis

    Get PDF
    Classification accuracy is the ability of a marker or diagnostic test to discriminate between two groups of individuals, cases and controls, and is commonly summarized using the receiver operating characteristic (ROC) curve. In studies of classification accuracy, there are often covariates that should be incorporated into the ROC analysis. We describe three different ways of using covariate informa- tion. For factors that affect marker observations among controls, we present a method for covariate adjustment. For factors that affect discrimination (ie the ROC curve), we describe methods for mod- elling the ROC curve as a function of covariates. Finally, for factors that contribute to discrimination, we propose combining the marker and covariate information, and ask how much discriminatory accu- racy improves with the addition of the marker to the covariates (incremental value). These methods follow naturally when representing the ROC curve as a summary of the distribution of case marker observations, standardized with respect to the control distribution

    Combining Predictors for Classification using the Area Under the ROC Curve

    Get PDF
    No single biomarker for cancer is considered adequately sensitive and specific for cancer screening. It is expected that the results of multiple markers will need to be combined in order to yield adequately accurate classification. Typically the objective function that is optimized for combining markers is the likelihood function. In this paper we consider an alternative objective function -- the area under the empirical receiver operating characteristic curve (AUC). We note that it yields consistent estimates of parameters in a generalized linear model for the risk score but does not require specifying the link function. Like logistic regression it yields consistent estimation with case-control or cohort data. Simulation studies suggest that AUC-based classification scores have performance comparable with logistic likelihood based scores when the logistic regression model holds. Analysis of data from a proteomics biomarker study shows that performance can be far superior to logistic regression derived scores when the logistic regression model does not hold. Model fitting by maximizing the AUC rather than the likelihood should be considered when the goal is to derive a marker combination score for classification or prediction

    Selecting Differentially Expressed Genes from Microarray Experiments

    Get PDF
    High throughput technologies, such as gene expression arrays and protein mass spectrometry, allow one to simultaneously evaluate thousands of potential biomarkers that distinguish different tissue types. Of particular interest here is cancer versus normal organ tissues. We consider statistical methods to rank genes (or proteins) in regards to differential expression between tissues. Various statistical measures are considered and we argue that two measures related to the Receiver Operating Characteristic Curve are particularly suitable for this purpose. We also propose that sampling variability in the gene rankings be quantified and suggest using the “selection probability function”, the probability distribution of rankings for each gene. This is estimated via the bootstrap. A real data set derived from gene expression arrays of 23 normal and 30 ovarian cancer tissues are analyzed. Simulation studies are also used to assess the relative performance of different statistical gene ranking measures and our quantification of sampling variability. Our approach leads naturally to a procedure for sample size calculations appropriate for exploratory studies that seek to identify differentially expressed genes

    Testing for improvement in prediction model performance

    Get PDF
    New methodology has been proposed in recent years for evaluating the improvement in prediction performance gained by adding a new predictor, Y, to a risk model containing a set of baseline predictors, X, for a binary outcome D. We prove theoretically that null hypotheses concerning no improvement in performance are equivalent to the simple null hypothesis that the coefficient for Y is zero in the risk model, P(D = 1|X, Y ). Therefore, testing for improvement in prediction performance is redundant if Y has already been shown to be a risk factor. We investigate properties of tests through simulation studies, focusing on the change in the area under the ROC curve (AUC). An unexpected finding is that standard testing procedures that do not adjust for variability in estimated regression coefficients are extremely conservative. This may explain why the AUC is widely considered insensitive to improvements in prediction performance and suggests that the problem of insensitivity has to do with use of invalid procedures for inference rather than with the measure itself. To avoid redundant testing and use of potentially problematic methods for inference, we recommend that hypothesis testing for no improvement be limited to evaluation of Y as a risk factor, for which methods are well developed and widely available. Analyses of measures of prediction performance should focus on estimation rather than on testing

    Limitations of the Odds Ratio in Gauging the Performance of a Diagnostic or Prognostic Marker

    Get PDF
    A marker that is strongly associated with outcome (or disease) is often assumed to be effective for classifying individuals according to their current or future outcome. However, for this to be true, the associated odds ratio must be of a magnitude rarely seen in epidemiological studies. An illustration of the relationship between odds ratios and receiver operating characteristic (ROC) curves shows, for example, that a marker with an odds ratio as high as 3 is in fact a very poor classification tool. If a marker identifies 10 percent of controls as positive (false positives) and has an odds ratio of 3, then it will only correctly identify 25 percent of cases as positive (true positives). Moreover, the authors illustrate that a single measure of association such as an odds ratio does not meaningfully describe a marker’s ability to classify subjects. Appropriate statistical methods for assessing and reporting the classification power of a marker are described. The serious pitfalls of using more traditional methods based on parameters in logistic regression models are illustrated

    A Randomized Study Comparing Digital Imaging to Traditional Glass Slide Microscopy for Breast Biopsy and Cancer Diagnosis.

    Get PDF
    BACKGROUND: Digital whole slide imaging may be useful for obtaining second opinions and is used in many countries. However, the U.S. Food and Drug Administration requires verification studies. METHODS: Pathologists were randomized to interpret one of four sets of breast biopsy cases during two phases, separated by ≥9 months, using glass slides or digital format (sixty cases per set, one slide per case, RESULTS: Sixty-five percent of responding pathologists were eligible, and 252 consented to randomization; 208 completed Phase I (115 glass, 93 digital); and 172 completed Phase II (86 glass, 86 digital). Accuracy was slightly higher using glass compared to digital format and varied by category: invasive carcinoma, 96% versus 93% ( CONCLUSIONS: In this large randomized study, digital format interpretations were similar to glass slide interpretations of benign and invasive cancer cases. However, cases in the middle of the spectrum, where more inherent variability exists, may be more problematic in digital format. Future studies evaluating the effect these findings exert on clinical practice and patient outcomes are required

    Speaking Stata: Distinct observations

    No full text
    Distinct observations are those different with respect to one or more variables, considered either individually or jointly. Distinctness is thus a key aspect of the similarity or difference of observations. It is sometimes confounded with uniqueness. Counting the number of distinct observations may be required at any point from initial data cleaning or checking to subsequent statistical analysis. We review how far existing commands in official Stata offer solutions to this issue, and we show how to answer questions about distinct observations from first principles by using the by prefix and the egen command. The new distinct command is offered as a convenience tool
    corecore