Search CORE

16 research outputs found

Computationally Efficient Confidence Intervals for Cross-validated Area Under the ROC Curve Estimates

Author: LeDell Erin
Petersen Maya L.
van der Laan Mark J.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 05/12/2012
Field of study

In binary classification problems, the area under the ROC curve (AUC), is an effective means of measuring the performance of your model. Most often, cross-validation is also used, in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we must obtain an estimate for its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, calculating the cross-validated AUC on even a relatively small data set can still require a large amount of computation time. Thus, when the processes of obtaining a single estimate for cross-validated AUC is significant, the bootstrap, as a means of variance estimation, can be computationally intractable. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC

Collection Of Biostatistics Research Archive

Recommended from our members

Classification of Nodal Pockets in Many-Electron Wave Functions via Machine Learning

Author: LeDell Erin
Publication venue: eScholarship, University of California
Publication date: 22/05/2012
Field of study

eScholarship - University of California

Recommended from our members

Scalable Ensemble Learning and Computationally Efficient Variance Estimation

Author: LeDell Erin
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

Ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. The Super Learner algorithm is an ensemble method that has been theoretically proven to represent an asymptotically optimal system for learning. The Super Learner, also known as stacking, combines multiple, typically diverse, base learning algorithms into a single, powerful prediction function through a secondary learning process called metalearning. Although ensemble methods offer superior performance over their singleton counterparts, there is an implicit computational cost to ensembles, as it requires training multiple base learning algorithms. We present several practical solutions to reducing the computational burden of ensemble learning while retaining superior model performance, along with software, code examples and benchmarks. Further, we present a generalized metalearning method for approximating the combination of the base learners which maximizes a model performance metric of interest. As an example, we create an AUC-maximizing Super Learner and show that this technique works especially well in the case of imbalanced binary outcomes. We conclude by presenting a computationally efficient approach to approximating variance for cross-validated AUC estimates using influence functions. This technique can be used generally to obtain confidence intervals for any estimator, however, due to the extensive use of AUC in the field of biostatistics, cross-validated AUC is used as a practical, motivating example.The goal of this body of work is to provide new scalable approaches to obtaining the highest performing predictive models while optimizing any model performance metric of interest, and further, to provide computationally efficient inference for that estimate

eScholarship - University of California

Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates.

Author: LeDell Erin,
Publication venue
Publication date: 27/04/2023
Field of study

Ezid

Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates

Author: LeDell Erin
Petersen Maya
van der Laan Mark
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2015
Field of study

In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC

Crossref

PubMed Central

eScholarship - University of California

Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates

Author: LeDell Erin
Petersen Maya
van der Laan Mark
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

PubMed Central

eScholarship - University of California