6,443 research outputs found
A PAUC-based Estimation Technique for Disease Classification and Biomarker Selection.
The partial area under the receiver operating characteristic curve (PAUC) is a well-established performance measure to evaluate biomarker combinations for disease classification. Because the PAUC is defined as the area under the ROC curve within a restricted interval of false positive rates, it enables practitioners to quantify sensitivity rates within pre-specified specificity ranges. This issue is of considerable importance for the development of medical screening tests. Although many authors have highlighted the importance of PAUC, there exist only few methods that use the PAUC as an objective function for finding optimal combinations of biomarkers. In this paper, we introduce a boosting method for deriving marker combinations that is explicitly based on the PAUC criterion. The proposed method can be applied in high-dimensional settings where the number of biomarkers exceeds the number of observations. Additionally, the proposed method incorporates a recently proposed variable selection technique (stability selection) that results in sparse prediction rules incorporating only those biomarkers that make relevant contributions to predicting the outcome of interest. Using both simulated data and real data, we demonstrate that our method performs well with respect to both variable selection and prediction accuracy. Specifically, if the focus is on a limited range of specificity values, the new method results in better predictions than other established techniques for disease classification
Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features
We propose a simple yet effective approach to the problem of pedestrian
detection which outperforms the current state-of-the-art. Our new features are
built on the basis of low-level visual features and spatial pooling.
Incorporating spatial pooling improves the translational invariance and thus
the robustness of the detection process. We then directly optimise the partial
area under the ROC curve (\pAUC) measure, which concentrates detection
performance in the range of most practical importance. The combination of these
factors leads to a pedestrian detector which outperforms all competitors on all
of the standard benchmark datasets. We advance state-of-the-art results by
lowering the average miss rate from to on the INRIA benchmark,
to on the ETH benchmark, to on the TUD-Brussels
benchmark and to on the Caltech-USA benchmark.Comment: 16 pages. Appearing in Proc. European Conf. Computer Vision (ECCV)
201
How to Evaluate the Quality of Unsupervised Anomaly Detection Algorithms?
When sufficient labeled data are available, classical criteria based on
Receiver Operating Characteristic (ROC) or Precision-Recall (PR) curves can be
used to compare the performance of un-supervised anomaly detection algorithms.
However , in many situations, few or no data are labeled. This calls for
alternative criteria one can compute on non-labeled data. In this paper, two
criteria that do not require labels are empirically shown to discriminate
accurately (w.r.t. ROC or PR based criteria) between algorithms. These criteria
are based on existing Excess-Mass (EM) and Mass-Volume (MV) curves, which
generally cannot be well estimated in large dimension. A methodology based on
feature sub-sampling and aggregating is also described and tested, extending
the use of these criteria to high-dimensional datasets and solving major
drawbacks inherent to standard EM and MV curves
Online and Stochastic Gradient Methods for Non-decomposable Loss Functions
Modern applications in sensitive domains such as biometrics and medicine
frequently require the use of non-decomposable loss functions such as
precision@k, F-measure etc. Compared to point loss functions such as
hinge-loss, these offer much more fine grained control over prediction, but at
the same time present novel challenges in terms of algorithm design and
analysis. In this work we initiate a study of online learning techniques for
such non-decomposable loss functions with an aim to enable incremental learning
as well as design scalable solvers for batch problems. To this end, we propose
an online learning framework for such loss functions. Our model enjoys several
nice properties, chief amongst them being the existence of efficient online
learning algorithms with sublinear regret and online to batch conversion
bounds. Our model is a provable extension of existing online learning models
for point loss functions. We instantiate two popular losses, prec@k and pAUC,
in our model and prove sublinear regret bounds for both of them. Our proofs
require a novel structural lemma over ranked lists which may be of independent
interest. We then develop scalable stochastic gradient descent solvers for
non-decomposable loss functions. We show that for a large family of loss
functions satisfying a certain uniform convergence property (that includes
prec@k, pAUC, and F-measure), our methods provably converge to the empirical
risk minimizer. Such uniform convergence results were not known for these
losses and we establish these using novel proof techniques. We then use
extensive experimentation on real life and benchmark datasets to establish that
our method can be orders of magnitude faster than a recently proposed cutting
plane method.Comment: 25 pages, 3 figures, To appear in the proceedings of the 28th Annual
Conference on Neural Information Processing Systems, NIPS 201
- …