Search CORE

682 research outputs found

Threshold Choice Methods: the Missing Link

Author: Ferri Cèsar
Flach Peter
Hernández-Orallo José
Publication venue
Publication date: 12/12/2011
Field of study

Many performance metrics have been introduced for the evaluation of classification performance, with different origins and niches of application: accuracy, macro-accuracy, area under the ROC curve, the ROC convex hull, the absolute error, and the Brier score (with its decomposition into refinement and calibration). One way of understanding the relation among some of these metrics is the use of variable operating conditions (either in the form of misclassification costs or class proportions). Thus, a metric may correspond to some expected loss over a range of operating conditions. One dimension for the analysis has been precisely the distribution we take for this range of operating conditions, leading to some important connections in the area of proper scoring rules. However, we show that there is another dimension which has not received attention in the analysis of performance metrics. This new dimension is given by the decision rule, which is typically implemented as a threshold choice method when using scoring models. In this paper, we explore many old and new threshold choice methods: fixed, score-uniform, score-driven, rate-driven and optimal, among others. By calculating the loss of these methods for a uniform range of operating conditions we get the 0-1 loss, the absolute error, the Brier score (mean squared error), the AUC and the refinement loss respectively. This provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation, namely: take a model, apply several threshold choice methods consistent with the information which is (and will be) available about the operating condition, and compare their expected losses. In order to assist in this procedure we also derive several connections between the aforementioned performance metrics, and we highlight the role of calibration in choosing the threshold choice method

arXiv.org e-Print Archive

CiteSeerX

Explore Bristol Research

Evaluating Probabilistic Classifiers: The Triptych

Author: Dimitriadis Timo
Gneiting Tilmann
Jordan Alexander I.
Vogel Peter
Publication venue
Publication date: 25/01/2023
Field of study

Probability forecasts for binary outcomes, often referred to as probabilistic classifiers or confidence scores, are ubiquitous in science and society, and methods for evaluating and comparing them are in great demand. We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance: The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value. A Murphy curve shows a forecast's mean elementary scores, including the widely used misclassification rate, and the area under a Murphy curve equals the mean Brier score. For a calibrated forecast, the reliability curve lies on the diagonal, and for competing calibrated forecasts, the ROC and Murphy curves share the same number of crossing points. We invoke the recently developed CORP (Consistent, Optimally binned, Reproducible, and Pool-Adjacent-Violators (PAV) algorithm based) approach to craft reliability diagrams and decompose a mean score into miscalibration (MCB), discrimination (DSC), and uncertainty (UNC) components. Plots of the DSC measure of discrimination ability versus the calibration metric MCB visualize classifier performance across multiple competitors. The proposed tools are illustrated in empirical examples from astrophysics, economics, and social science

arXiv.org e-Print Archive

Uncertainty aware classification:augmenting classifiers to handle uncertainty

Author: Perello Nieto Miquel
Publication venue
Publication date: 09/05/2023
Field of study

Explore Bristol Research

A transferable machine-learning framework linking interstice distribution and plastic heterogeneity in metallic glasses

Author: Jain Anubhav
Wang Qi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2019
Field of study

When metallic glasses (MGs) are subjected to mechanical loads, the plastic response of atoms is non-uniform. However, the extent and manner in which atomic environment signatures present in the undeformed structure determine this plastic heterogeneity remain elusive. Here, we demonstrate that novel site environment features that characterize interstice distributions around atoms combined with machine learning (ML) can reliably identify plastic sites in several Cu-Zr compositions. Using only quenched structural information as input, the ML-based plastic probability estimates ("quench-in softness" metric) can identify plastic sites that could activate at high strains, losing predictive power only upon the formation of shear bands. Moreover, we reveal that a quench-in softness model trained on a single composition and quenching rate substantially improves upon previous models in generalizing to different compositions and completely different MG systems (Ni62Nb38, Al90Sm10 and Fe80P20). Our work presents a general, data-centric framework that could potentially be used to address the structural origin of any site-specific property in MGs

arXiv.org e-Print Archive

eScholarship - University of California

A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images

Author: Akagunduz Erdem
Ulku Irem
Publication venue
Publication date: 14/05/2020
Field of study

Semantic segmentation is the pixel-wise labelling of an image. Since the problem is defined at the pixel level, determining image class labels only is not acceptable, but localising them at the original image pixel resolution is necessary. Boosted by the extraordinary ability of convolutional neural networks (CNN) in creating semantic, high level and hierarchical image features; excessive numbers of deep learning-based 2D semantic segmentation approaches have been proposed within the last decade. In this survey, we mainly focus on the recent scientific developments in semantic segmentation, specifically on deep learning-based methods using 2D images. We started with an analysis of the public image sets and leaderboards for 2D semantic segmantation, with an overview of the techniques employed in performance evaluation. In examining the evolution of the field, we chronologically categorised the approaches into three main periods, namely pre-and early deep learning era, the fully convolutional era, and the post-FCN era. We technically analysed the solutions put forward in terms of solving the fundamental problems of the field, such as fine-grained localisation and scale invariance. Before drawing our conclusions, we present a table of methods from all mentioned eras, with a brief summary of each approach that explains their contribution to the field. We conclude the survey by discussing the current challenges of the field and to what extent they have been solved.Comment: Updated with new studie

arXiv.org e-Print Archive

Directory of Open Access Journals

Understanding metric-related pitfalls in image analysis validation

Author: Acion Laura
Antonelli Michela
Arbel Tal
Bakas Spyridon
Baumgartner Michael
Benis Arriel
Blaschko Matthew
Büttner Florian
Calster Ben Van
Cardoso M. Jorge
Chen Jianxu
Cheplygina Veronika
Christodoulou Evangelia
Cimini Beth A.
Collins Gary S.
Eisenmann Matthias
Farahani Keyvan
Ferrer Luciana
Galdran Adrian
Ginneken Bram van
Glocker Ben
Godau Patrick
Haase Robert
Hashimoto Daniel A.
Heckmann-Nötzel Doreen
Hoffman Michael M.
Huisman Merel
Isensee Fabian
Jannin Pierre
Jäger Paul F.
Kahn Charles E.
Kainmueller Dagmar
Kainz Bernhard
Karargyris Alexandros
Karthikesalingam Alan
Kavur A. Emre
Kenngott Hannes
Kleesiek Jens
Kofler Florian
Kooi Thijs
Kopp-Schneider Annette
Kozubek Michal
Kreshuk Anna
Kurc Tahsin
Landman Bennett A.
Litjens Geert
Madani Amin
Maier-Hein Klaus
Maier-Hein Lena
Martel Anne L.
Mattson Peter
Meijering Erik
Menze Bjoern
Moons Karel G. M.
Müller Henning
Nichyporuk Brennan
Nickel Felix
Petersen Jens
Rafelski Susanne M.
Rajpoot Nasir
Reinke Annika
Reyes Mauricio
Riegler Michael A.
Rieke Nicola
Rädsch Tim
Saez-Rodriguez Julio
Shetty Shravya
Smeden Maarten van
Sudre Carole H.
Summers Ronald M.
Sánchez Clara I.
Taha Abdel A.
Tiulpin Aleksei
Tizabi Minu D.
Tsaftaris Sotirios A.
Varoquaux Gaël
Wiesenfarth Manuel
Yaniv Ziv R.
Publication venue
Publication date: 01/01/2023
Field of study

Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.Comment: Shared first authors: Annika Reinke, Minu D. Tizabi; shared senior authors: Paul F. J\"ager, Lena Maier-Hei

arXiv.org e-Print Archive

IUPUIScholarWorks

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Warwick Research Archives Portal Repository

HAL-CEA

Bern Open Repository and Information System (BORIS)

Utrecht University Repository

HAL-Rennes 1

Consumer Credit-Risk Models Via Machine-Learning Algorithms

Author: Adlar J. Kim
Amir E. Khandani
Andrew W. Lo
Atiya
Avery
Bellotti
Bishop
Boot
Breiman
Buyukkarabacaka
Drehmann
Duda
Dwyer
Foster
Freund
Galindo
Han
Hand
Huang
Huang
Kohavi
Landis
Li
Martens
Min
Ong
Pérignon
Scholnick
Shin
Simon
Stein
Thomas
Vapnik
Zinman
Publication venue: 'Elsevier BV'
Publication date: 01/05/2010
Field of study

We apply machine-learning techniques to construct nonlinear nonparametric forecasting models of consumer credit risk. By combining customer transactions and credit bureau data from January 2005 to April 2009 for a sample of a major commercial bank’s customers, we are able to construct out-of-sample forecasts that significantly improve the classification rates of credit-card-holder delinquencies and defaults, with linear regression R2’s of forecasted/realized delinquencies of 85%. Using conservative assumptions for the costs and benefits of cutting credit lines based on machine-learning forecasts, we estimate the cost savings to range from 6% to 25% of total losses. Moreover, the time-series patterns of estimated delinquency rates from this model over the course of the recent financial crisis suggest that aggregated consumer credit-risk analytics may have important applications in forecasting systemic risk.Massachusetts Institute of Technology. Laboratory for Financial EngineeringMassachusetts Institute of Technology. Center for Future Bankin

DSpace@MIT

Crossref