8,049 research outputs found
Multi-class ROC analysis from a multi-objective optimisation perspective
Copyright © 2006 Elsevier. NOTICE: this is the authorâs version of a work that was accepted for publication in Pattern Recognition Letters . Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Pattern Recognition Letters, Vol. 27 Issue 8 (2006), DOI: 10.1016/j.patrec.2005.10.016Notes: Receiver operating characteristics (ROC) are traditionally used for assessing and tuning classifiers discriminating between two classes. This paper is the first to set ROC analysis in a multi-objective optimisation framework and thus generalise ROC curves to any number of classes, showing how multi-objective optimisation may be used to optimise classifier performance. An important new result is that the appropriate measure for assessing overall classifier quality is the Gini coefficient, rather than the volume under the ROC surface as previously thought. The method is currently being exploited in a KTP project with AI Corporation on detecting credit card fraud.The receiver operating characteristic (ROC) has become a standard tool for the analysis and comparison of classifiers when the costs of misclassification are unknown. There has been relatively little work, however, examining ROC for more than two classes. Here we discuss and present an extension to the standard two-class ROC for multi-class problems.
We define the ROC surface for the Q-class problem in terms of a multi-objective optimisation problem in which the goal is to simultaneously minimise the Q(Q â 1) misclassification rates, when the misclassification costs and parameters governing the classifierâs behaviour are unknown. We present an evolutionary algorithm to locate the Pareto frontâthe optimal trade-off surface between misclassifications of different types. The use of the Pareto optimal surface to compare classifiers is discussed and we present a straightforward multi-class analogue of the Gini coefficient. The performance of the evolutionary algorithm is illustrated on a synthetic three class problem, for both k-nearest neighbour and multi-layer perceptron classifiers
On the efficient use of uncertainty when performing expensive ROC optimisation.
Copyright © 2008 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.IEEE Congress on Evolutionary Computation 2008 (CEC 2008). (IEEE World Congress on Computational Intelligence), Hong Kong, 1-6 June 2008When optimising receiver operating characteristic (ROC) curves there is an inherent degree of uncertainty associated with the operating point evaluation of a model parameterisation x. This is due to the finite amount of training data used to evaluate the true and false positive rates of x. The uncertainty associated with any particular x can be reduced, but only at the computation cost of evaluating more data. Here we explicitly represent this uncertainty through the use of probabilistically non-dominated archives, and show how expensive ROC optimisation problems may be tackled by only evaluating a small subset of the available data at each generation of an optimisation algorithm. Illustrative results are given on data sets from the well known UCI machine learning repository
Recommended from our members
Reducing Levelised Cost of Energy and Environmental Impact of a Hybrid Microturbine-Based Concentrated Solar Power Plant
A multi-objective optimisation of a hybrid solar dish power plant aiming to minimise the levelised cost of energy while keeping emissions as low as possible is presented in this paper. The analysis was carried out for both regenerative Brayton-Joule regenerative cycle and inter-cooled and re-heated regenerative cycle using an analysis tool developed during this research and validated against available experimental data. The plant optimisation was performed using a fast and computationally efficient optimisation technique called âresponse surface optimisationâ, which generates an approximated function (or response surface) that can be used to find a set of thermodynamic parameters that maximise the plant efficiency while minimising emissions. A Design of Experiment (DOE) Latin hypercube technique was used to generate the training database and a one-dimensional model were used to evaluate the output variables for each point of the database. The DOE was then coupled to a Second Order Polynomial regression technique to approximate the behaviour of the system in the design space. A genetic algorithm was then applied in order to find a high performance arrangement. Results show a good trade-off between emissions and levelised cost of energy for both plant layouts. The first arrangment shows a minimum levelised cost of energy in the range between 38.5 and 38.8 âŹcts/kWh with an electrical power production of about 8kW. The second showed a LCOE in the range between 50.5 and 51 âŹcts/kWh and a net electrical power output of 16 kW
Multi-Objective ROC learning for classification
Receiver operating characteristic (ROC) curves are widely used for evaluating classifier
performance, having been applied to e.g. signal detection, medical diagnostics and safety
critical systems. They allow examination of the trade-offs between true and false positive
rates as misclassification costs are varied. Examination of the resulting graphs and calcu-
lation of the area under the ROC curve (AUC) allows assessment of how well a classifier is
able to separate two classes and allows selection of an operating point with full knowledge
of the available trade-offs.
In this thesis a multi-objective evolutionary algorithm (MOEA) is used to find clas-
sifiers whose ROC graph locations are Pareto optimal. The Relevance Vector Machine
(RVM) is a state-of-the-art classifier that produces sparse Bayesian models, but is unfor-
tunately prone to overfitting. Using the MOEA, hyper-parameters for RVM classifiers are
set, optimising them not only in terms of true and false positive rates but also a novel
measure of RVM complexity, thus encouraging sparseness, and producing approximations
to the Pareto front. Several methods for regularising the RVM during the MOEA train-
ing process are examined and their performance evaluated on a number of benchmark
datasets demonstrating they possess the capability to avoid overfitting whilst producing
performance equivalent to that of the maximum likelihood trained RVM.
A common task in bioinformatics is to identify genes associated with various genetic
conditions by finding those genes useful for classifying a condition against a baseline. Typ-
ically, datasets contain large numbers of gene expressions measured in relatively few sub-
jects. As a result of the high dimensionality and sparsity of examples, it can be very easy
to find classifiers with near perfect training accuracies but which have poor generalisation
capability. Additionally, depending on the condition and treatment involved, evaluation
over a range of costs will often be desirable. An MOEA is used to identify genes for clas-
sification by simultaneously maximising the area under the ROC curve whilst minimising
model complexity. This method is illustrated on a number of well-studied datasets and ap-
plied to a recent bioinformatics database resulting from the current InChianti population
study.
Many classifiers produce âhardâ, non-probabilistic classifications and are trained to find
a single set of parameters, whose values are inevitably uncertain due to limited available
training data. In a Bayesian framework it is possible to ameliorate the effects of this
parameter uncertainty by averaging over classifiers weighted by their posterior probabil-
ity. Unfortunately, the required posterior probability is not readily computed for hard
classifiers. In this thesis an Approximate Bayesian Computation Markov Chain Monte
Carlo algorithm is used to sample model parameters for a hard classifier using the AUC
as a measure of performance. The ability to produce ROC curves close to the Bayes op-
timal ROC curve is demonstrated on a synthetic dataset. Due to the large numbers of
sampled parametrisations, averaging over them when rapid classification is needed may
be impractical and thus methods for producing sparse weightings are investigated
iTETRIS: An Integrated Wireless and Traffic Platform for Real-Time Road Traffic Management Solutions
Wireless vehicular cooperative systems have been identified as an attractive solution to improve road traffic management, thereby contributing to the European goal of safer, cleaner, and more efficient and sustainable traffic solutions. V2V-V2I communication technologies can improve traffic management through real-time exchange of data among vehicles and with road infrastructure. It is also of great importance to investigate the adequate combination of V2V and V2I technologies to ensure the continuous and costefficient operation of traffic management solutions based on wireless vehicular cooperative solutions. However, to adequately design and optimize these communication protocols and analyze the potential of wireless vehicular cooperative systems to improve road traffic management, adequate testbeds and field operational tests need to be conducted.
Despite the potential of Field Operational Tests to get the first insights into the benefits and problems faced in the development of wireless vehicular cooperative systems, there is yet the need to evaluate in the long term and large dimension the true potential benefits of wireless vehicular cooperative systems to improve traffic efficiency. To this aim, iTETRIS is devoted to the development of advanced tools coupling traffic and wireless communication simulators
- âŠ