2,146 research outputs found

    A boosting method for maximizing the partial area under the ROC curve

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The receiver operating characteristic (ROC) curve is a fundamental tool to assess the discriminant performance for not only a single marker but also a score function combining multiple markers. The area under the ROC curve (AUC) for a score function measures the intrinsic ability for the score function to discriminate between the controls and cases. Recently, the partial AUC (pAUC) has been paid more attention than the AUC, because a suitable range of the false positive rate can be focused according to various clinical situations. However, existing pAUC-based methods only handle a few markers and do not take nonlinear combination of markers into consideration.</p> <p>Results</p> <p>We have developed a new statistical method that focuses on the pAUC based on a boosting technique. The markers are combined componentially for maximizing the pAUC in the boosting algorithm using natural cubic splines or decision stumps (single-level decision trees), according to the values of markers (continuous or discrete). We show that the resulting score plots are useful for understanding how each marker is associated with the outcome variable. We compare the performance of the proposed boosting method with those of other existing methods, and demonstrate the utility using real data sets. As a result, we have much better discrimination performances in the sense of the pAUC in both simulation studies and real data analysis.</p> <p>Conclusions</p> <p>The proposed method addresses how to combine the markers after a pAUC-based filtering procedure in high dimensional setting. Hence, it provides a consistent way of analyzing data based on the pAUC from maker selection to marker combination for discrimination problems. The method can capture not only linear but also nonlinear association between the outcome variable and the markers, about which the nonlinearity is known to be necessary in general for the maximization of the pAUC. The method also puts importance on the accuracy of classification performance as well as interpretability of the association, by offering simple and smooth resultant score plots for each marker.</p

    A PAUC-based Estimation Technique for Disease Classification and Biomarker Selection.

    Get PDF
    The partial area under the receiver operating characteristic curve (PAUC) is a well-established performance measure to evaluate biomarker combinations for disease classification. Because the PAUC is defined as the area under the ROC curve within a restricted interval of false positive rates, it enables practitioners to quantify sensitivity rates within pre-specified specificity ranges. This issue is of considerable importance for the development of medical screening tests. Although many authors have highlighted the importance of PAUC, there exist only few methods that use the PAUC as an objective function for finding optimal combinations of biomarkers. In this paper, we introduce a boosting method for deriving marker combinations that is explicitly based on the PAUC criterion. The proposed method can be applied in high-dimensional settings where the number of biomarkers exceeds the number of observations. Additionally, the proposed method incorporates a recently proposed variable selection technique (stability selection) that results in sparse prediction rules incorporating only those biomarkers that make relevant contributions to predicting the outcome of interest. Using both simulated data and real data, we demonstrate that our method performs well with respect to both variable selection and prediction accuracy. Specifically, if the focus is on a limited range of specificity values, the new method results in better predictions than other established techniques for disease classification

    Boosting the concordance index for survival data - a unified framework to derive and evaluate biomarker combinations

    Get PDF
    The development of molecular signatures for the prediction of time-to-event outcomes is a methodologically challenging task in bioinformatics and biostatistics. Although there are numerous approaches for the derivation of marker combinations and their evaluation, the underlying methodology often suffers from the problem that different optimization criteria are mixed during the feature selection, estimation and evaluation steps. This might result in marker combinations that are only suboptimal regarding the evaluation criterion of interest. To address this issue, we propose a unified framework to derive and evaluate biomarker combinations. Our approach is based on the concordance index for time-to-event data, which is a non-parametric measure to quantify the discrimatory power of a prediction rule. Specifically, we propose a component-wise boosting algorithm that results in linear biomarker combinations that are optimal with respect to a smoothed version of the concordance index. We investigate the performance of our algorithm in a large-scale simulation study and in two molecular data sets for the prediction of survival in breast cancer patients. Our numerical results show that the new approach is not only methodologically sound but can also lead to a higher discriminatory power than traditional approaches for the derivation of gene signatures.Comment: revised manuscript - added simulation study, additional result

    ROC-Based Model Estimation for Forecasting Large Changes in Demand

    Get PDF
    Forecasting for large changes in demand should benefit from different estimation than that used for estimating mean behavior. We develop a multivariate forecasting model designed for detecting the largest changes across many time series. The model is fit based upon a penalty function that maximizes true positive rates along a relevant false positive rate range and can be used by managers wishing to take action on a small percentage of products likely to change the most in the next time period. We apply the model to a crime dataset and compare results to OLS as the basis for comparisons as well as models that are promising for exceptional demand forecasting such as quantile regression, synthetic data from a Bayesian model, and a power loss model. Using the Partial Area Under the Curve (PAUC) metric, our results show statistical significance, a 35 percent improvement over OLS, and at least a 20 percent improvement over competing methods. We suggest management with an increasing number of products to use our method for forecasting large changes in conjunction with typical magnitude-based methods for forecasting expected demand
    • …
    corecore