7,625 research outputs found
Asymptotic properties of the sequential empirical ROC, PPV and NPV curves under case-control sampling
The receiver operating characteristic (ROC) curve, the positive predictive
value (PPV) curve and the negative predictive value (NPV) curve are three
measures of performance for a continuous diagnostic biomarker. The ROC, PPV and
NPV curves are often estimated empirically to avoid assumptions about the
distributional form of the biomarkers. Recently, there has been a push to
incorporate group sequential methods into the design of diagnostic biomarker
studies. A thorough understanding of the asymptotic properties of the
sequential empirical ROC, PPV and NPV curves will provide more flexibility when
designing group sequential diagnostic biomarker studies. In this paper, we
derive asymptotic theory for the sequential empirical ROC, PPV and NPV curves
under case-control sampling using sequential empirical process theory. We show
that the sequential empirical ROC, PPV and NPV curves converge to the sum of
independent Kiefer processes and show how these results can be used to derive
asymptotic results for summaries of the sequential empirical ROC, PPV and NPV
curves.Comment: Published in at http://dx.doi.org/10.1214/11-AOS937 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
AUC Optimisation and Collaborative Filtering
In recommendation systems, one is interested in the ranking of the predicted
items as opposed to other losses such as the mean squared error. Although a
variety of ways to evaluate rankings exist in the literature, here we focus on
the Area Under the ROC Curve (AUC) as it widely used and has a strong
theoretical underpinning. In practical recommendation, only items at the top of
the ranked list are presented to the users. With this in mind, we propose a
class of objective functions over matrix factorisations which primarily
represent a smooth surrogate for the real AUC, and in a special case we show
how to prioritise the top of the list. The objectives are differentiable and
optimised through a carefully designed stochastic gradient-descent-based
algorithm which scales linearly with the size of the data. In the special case
of square loss we show how to improve computational complexity by leveraging
previously computed measures. To understand theoretically the underlying matrix
factorisation approaches we study both the consistency of the loss functions
with respect to AUC, and generalisation using Rademacher theory. The resulting
generalisation analysis gives strong motivation for the optimisation under
study. Finally, we provide computation results as to the efficacy of the
proposed method using synthetic and real data
Online and Stochastic Gradient Methods for Non-decomposable Loss Functions
Modern applications in sensitive domains such as biometrics and medicine
frequently require the use of non-decomposable loss functions such as
precision@k, F-measure etc. Compared to point loss functions such as
hinge-loss, these offer much more fine grained control over prediction, but at
the same time present novel challenges in terms of algorithm design and
analysis. In this work we initiate a study of online learning techniques for
such non-decomposable loss functions with an aim to enable incremental learning
as well as design scalable solvers for batch problems. To this end, we propose
an online learning framework for such loss functions. Our model enjoys several
nice properties, chief amongst them being the existence of efficient online
learning algorithms with sublinear regret and online to batch conversion
bounds. Our model is a provable extension of existing online learning models
for point loss functions. We instantiate two popular losses, prec@k and pAUC,
in our model and prove sublinear regret bounds for both of them. Our proofs
require a novel structural lemma over ranked lists which may be of independent
interest. We then develop scalable stochastic gradient descent solvers for
non-decomposable loss functions. We show that for a large family of loss
functions satisfying a certain uniform convergence property (that includes
prec@k, pAUC, and F-measure), our methods provably converge to the empirical
risk minimizer. Such uniform convergence results were not known for these
losses and we establish these using novel proof techniques. We then use
extensive experimentation on real life and benchmark datasets to establish that
our method can be orders of magnitude faster than a recently proposed cutting
plane method.Comment: 25 pages, 3 figures, To appear in the proceedings of the 28th Annual
Conference on Neural Information Processing Systems, NIPS 201
Margin-based Ranking and an Equivalence between AdaBoost and RankBoost
We study boosting algorithms for learning to rank. We give a general margin-based bound for
ranking based on covering numbers for the hypothesis space. Our bound suggests that algorithms
that maximize the ranking margin will generalize well. We then describe a new algorithm, smooth
margin ranking, that precisely converges to a maximum ranking-margin solution. The algorithm
is a modification of RankBoost, analogous to “approximate coordinate ascent boosting.” Finally,
we prove that AdaBoost and RankBoost are equally good for the problems of bipartite ranking and
classification in terms of their asymptotic behavior on the training set. Under natural conditions,
AdaBoost achieves an area under the ROC curve that is equally as good as RankBoost’s; furthermore,
RankBoost, when given a specific intercept, achieves a misclassification error that is as good
as AdaBoost’s. This may help to explain the empirical observations made by Cortes andMohri, and
Caruana and Niculescu-Mizil, about the excellent performance of AdaBoost as a bipartite ranking
algorithm, as measured by the area under the ROC curve
One- and two-sample nonparametric tests for the signal-to-noise ratio based on record statistics
A new family of nonparametric statistics, the r-statistics, is introduced. It
consists of counting the number of records of the cumulative sum of the sample.
The single-sample r-statistic is almost as powerful as Student's t-statistic
for Gaussian and uniformly distributed variables, and more powerful than the
sign and Wilcoxon signed-rank statistics as long as the data are not too
heavy-tailed.
Three two-sample parametric r-statistics are proposed, one with a higher
specificity but a smaller sensitivity than Mann-Whitney U-test and the other
one a higher sensitivity but a smaller specificity. A nonparametric two-sample
r-statistic is introduced, whose power is very close to that of Welch statistic
for Gaussian or uniformly distributed variables.Comment: 12 pages, 13 figure
- …