29 research outputs found
Formulation and comparison of multi-class ROC surfaces
2nd ROCML workshop, held within the 22nd International Conference on Machine Learning (ICML 2005), Bonn, Germany, 7-11 August 2005The Receiver Operating Characteristic (ROC) has become a standard tool for the analysis and comparison of classifiers when the costs of misclassification are unknown. There has been relatively little work, however, examining ROC for more than two classes.
Here we define the ROC surface for the Q-class problem in terms of a multi-objective optimisation problem in which the goal is to simultaneously minimise the Q(Q − 1) mis-classification rates, when the misclassification costs and parameters governing the classifier’s behaviour are unknown. We present an evolutionary algorithm to locate the optimal trade-off surface between misclassifications of different types. The performance of the evolutionary algorithm is illustrated on a synthetic three class problem. In addition the use of the Pareto optimal surface to compare classifiers is discussed, and we present a straightforward multi-class analogue of the Gini coefficient. This is illustrated on synthetic and standard machine learning dat
Multi-class ROC analysis from a multi-objective optimisation perspective
Copyright © 2006 Elsevier. NOTICE: this is the author’s version of a work that was accepted for publication in Pattern Recognition Letters . Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Pattern Recognition Letters, Vol. 27 Issue 8 (2006), DOI: 10.1016/j.patrec.2005.10.016Notes: Receiver operating characteristics (ROC) are traditionally used for assessing and tuning classifiers discriminating between two classes. This paper is the first to set ROC analysis in a multi-objective optimisation framework and thus generalise ROC curves to any number of classes, showing how multi-objective optimisation may be used to optimise classifier performance. An important new result is that the appropriate measure for assessing overall classifier quality is the Gini coefficient, rather than the volume under the ROC surface as previously thought. The method is currently being exploited in a KTP project with AI Corporation on detecting credit card fraud.The receiver operating characteristic (ROC) has become a standard tool for the analysis and comparison of classifiers when the costs of misclassification are unknown. There has been relatively little work, however, examining ROC for more than two classes. Here we discuss and present an extension to the standard two-class ROC for multi-class problems.
We define the ROC surface for the Q-class problem in terms of a multi-objective optimisation problem in which the goal is to simultaneously minimise the Q(Q − 1) misclassification rates, when the misclassification costs and parameters governing the classifier’s behaviour are unknown. We present an evolutionary algorithm to locate the Pareto front—the optimal trade-off surface between misclassifications of different types. The use of the Pareto optimal surface to compare classifiers is discussed and we present a straightforward multi-class analogue of the Gini coefficient. The performance of the evolutionary algorithm is illustrated on a synthetic three class problem, for both k-nearest neighbour and multi-layer perceptron classifiers
Technical Note: Towards ROC Curves in Cost Space
ROC curves and cost curves are two popular ways of visualising classifier
performance, finding appropriate thresholds according to the operating
condition, and deriving useful aggregated measures such as the area under the
ROC curve (AUC) or the area under the optimal cost curve. In this note we
present some new findings and connections between ROC space and cost space, by
using the expected loss over a range of operating conditions. In particular, we
show that ROC curves can be transferred to cost space by means of a very
natural way of understanding how thresholds should be chosen, by selecting the
threshold such that the proportion of positive predictions equals the operating
condition (either in the form of cost proportion or skew). We call these new
curves {ROC Cost Curves}, and we demonstrate that the expected loss as measured
by the area under these curves is linearly related to AUC. This opens up a
series of new possibilities and clarifies the notion of cost curve and its
relation to ROC analysis. In addition, we show that for a classifier that
assigns the scores in an evenly-spaced way, these curves are equal to the Brier
Curves. As a result, this establishes the first clear connection between AUC
and the Brier score
Landslide Susceptibility Using Climatic–Environmental Factors Using the Weight-of-Evidence Method—A Study Area in Central Italy
The Italian territory is subject to a high level of hydrogeological instability that periodically
results in the loss of lives, buildings and productive activities. Therefore, the recognition of areas
susceptible to hydrogeological instability is the basis for preparing countermeasures. In this context,
landslide susceptibility in the mid-Adriatic slope was analyzed using a statistical method, the
weight of evidence (WoE), which uses information from several independent sources to provide
sufficient evidence to predict possible system developments. Only flows, slides, debris flows and
mud flows were considered, with a total of 14,927 landslides obtained from the IFFI (Inventory of
Franous Phenomena in Italy) database. Seven climatic–environmental factors were used for mapping
landslide susceptibility in the study area: slope, aspect, extreme precipitation, normalized difference
vegetation index (NDVI), CORINE land cover (CLC), and topographic wetness index (TWI). The
introduction of these factors into the model resulted in rasters that allowed calculation by GIS-type
software of a susceptibility map. The result was validated by the ROC curve method, using a group of
landslides, equal to 20% of the total, not used in the modeling. The performance of the model, i.e., the
ability to predict the presence or absence of a landslide movement correctly, was 0.75, indicating a
moderately accurate model, which nevertheless appears innovative for two reasons: the first is that it
analyzes an inhomogeneous area of more than 9000 km2
, which is very large compared to similar
analyses, and the second reason is the causal factors used, which have high weights for some classes
despite the heterogeneity of the area. This research has enabled the simultaneous introduction of
unconventional factors for landslide susceptibility analysis, which, however, could be successfully
used at larger scales in the future
Receiver operating characteristic (ROC) movies, universal ROC (UROC) curves, and coefficient of predictive ability (CPA)
Throughout science and technology, receiver operating characteristic (ROC) curves and associated area under the curve (AUC) measures constitute powerful tools for assessing the predictive abilities of features, markers and tests in binary classification problems. Despite its immense popularity, ROC analysis has been subject to a fundamental restriction, in that it applies to dichotomous (yes or no) outcomes only. Here we introduce ROC movies and universal ROC (UROC) curves that apply to just any linearly ordered outcome, along with an associated coefficient of predictive ability (CPA) measure. CPA equals the area under the UROC curve, and admits appealing interpretations in terms of probabilities and rank based covariances. For binary outcomes CPA equals AUC, and for pairwise distinct outcomes CPA relates linearly to Spearman’s coefficient, in the same way that the C index relates linearly to Kendall’s coefficient. ROC movies, UROC curves, and CPA nest and generalize the tools of classical ROC analysis, and are bound to supersede them in a wealth of applications. Their usage is illustrated in data examples from biomedicine and meteorology, where rank based measures yield new insights in the WeatherBench comparison of the predictive performance of convolutional neural networks and physical-numerical models for weather prediction
Threshold Choice Methods: the Missing Link
Many performance metrics have been introduced for the evaluation of
classification performance, with different origins and niches of application:
accuracy, macro-accuracy, area under the ROC curve, the ROC convex hull, the
absolute error, and the Brier score (with its decomposition into refinement and
calibration). One way of understanding the relation among some of these metrics
is the use of variable operating conditions (either in the form of
misclassification costs or class proportions). Thus, a metric may correspond to
some expected loss over a range of operating conditions. One dimension for the
analysis has been precisely the distribution we take for this range of
operating conditions, leading to some important connections in the area of
proper scoring rules. However, we show that there is another dimension which
has not received attention in the analysis of performance metrics. This new
dimension is given by the decision rule, which is typically implemented as a
threshold choice method when using scoring models. In this paper, we explore
many old and new threshold choice methods: fixed, score-uniform, score-driven,
rate-driven and optimal, among others. By calculating the loss of these methods
for a uniform range of operating conditions we get the 0-1 loss, the absolute
error, the Brier score (mean squared error), the AUC and the refinement loss
respectively. This provides a comprehensive view of performance metrics as well
as a systematic approach to loss minimisation, namely: take a model, apply
several threshold choice methods consistent with the information which is (and
will be) available about the operating condition, and compare their expected
losses. In order to assist in this procedure we also derive several connections
between the aforementioned performance metrics, and we highlight the role of
calibration in choosing the threshold choice method