11 research outputs found
tau-FPL: Tolerance-Constrained Learning in Linear Time
Learning a classifier with control on the false-positive rate plays a
critical role in many machine learning applications. Existing approaches either
introduce prior knowledge dependent label cost or tune parameters based on
traditional classifiers, which lack consistency in methodology because they do
not strictly adhere to the false-positive rate constraint. In this paper, we
propose a novel scoring-thresholding approach, tau-False Positive Learning
(tau-FPL) to address this problem. We show the scoring problem which takes the
false-positive rate tolerance into accounts can be efficiently solved in linear
time, also an out-of-bootstrap thresholding method can transform the learned
ranking function into a low false-positive classifier. Both theoretical
analysis and experimental results show superior performance of the proposed
tau-FPL over existing approaches.Comment: 32 pages, 3 figures. This is an extended version of our paper
published in AAAI-1
Batch and online learning algorithms for Nonconvex Neyman-Pearson classification
International audienceWe describe and evaluate two algorithms for Neyman-Pearson (NP) classification problem which has been recently shown to be of a particular importance for bipartite ranking problems. NP classification is a nonconvex problem involving a constraint on false negatives rate. We investigated batch algorithm based on DC programming and stochastic gradient method well suited for large scale datasets. Empirical evidences illustrate the potential of the proposed methods
Bipartite Ranking: a Risk-Theoretic Perspective
We present a systematic study of the bipartite ranking problem, with the aim of explicating its connections to the class-probability estimation problem. Our study focuses on the properties of the statistical risk for bipartite ranking with general losses, which is closely related to a generalised notion of the area under the ROC
curve: we establish alternate representations of this risk, relate the Bayes-optimal risk to a class of probability divergences, and characterise the set of Bayes-optimal scorers for the risk. We further study properties of a generalised class of bipartite risks, based on the p-norm push of Rudin (2009). Our analysis is based on the rich framework of proper losses, which are the central tool in the study of class-probability estimation. We show
how this analytic tool makes transparent the generalisations of several existing results, such as the equivalence of the minimisers for four seemingly disparate risks from bipartite ranking and class-probability estimation. A novel practical implication of our analysis is the design of new families of losses for scenarios where accuracy at the head of ranked list is paramount, with comparable empirical performance to the p-norm push