Search CORE

719 research outputs found

Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models

Author: Cooper Gregory F.
Naeini Mahdi Pakdaman
Publication venue
Publication date: 16/11/2015
Field of study

Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called \textit{ensemble of near isotonic regression} (ENIR). The method can be considered as an extension of BBQ, a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression. ENIR is designed to address the key limitation of isotonic regression which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be combined with many existing classification models. We demonstrate the performance of ENIR on synthetic and real datasets for the commonly used binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular on the real data, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is

O(N \log N)

time, where

N

is the number of samples

arXiv.org e-Print Archive

Crossref

Reliably Calibrated Isotonic Regression

Author: Klami Arto
Nyberg Otto
Publication venue: Springer International Publishing AG
Publication date: 01/01/2021
Field of study

Using classifiers for decision making requires well-calibrated probabilities for estimation of expected utility. Furthermore, knowledge of the reliability is needed to quantify uncertainty. Outputs of most classifiers can be calibrated, typically by using isotonic regression that bins classifier outputs together to form empirical probability estimates. However, especially for highly imbalanced problems it produces bins with few samples resulting in probability estimates with very large uncertainty. We provide a formal method for quantifying the reliability of calibration and extend isotonic regression to provide reliable calibration with guarantees for width of credible intervals of the probability estimates. We demonstrate the method in calibrating purchase probabilities in e-commerce and achieve significant reduction in uncertainty without compromising accuracy.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

An Evaluation of Calibration Methods for Data Mining Models in Simulation Problems

Author: Bella Sanjuán Antonio
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 28/11/2011
Field of study

Data mining is useful in making single decisions. The problem is when there are several related problems and the best local decisions do not make the best global result. We propose to calibrate each local data mining models in order to obtain accurate models, and to use simulation to merge the local models and obtain a good overall result.Bella Sanjuán, A. (2008). An Evaluation of Calibration Methods for Data Mining Models in Simulation Problems. http://hdl.handle.net/10251/13631Archivo delegad

RiuNet

Non-Parametric Calibration of Probabilistic Regression

Author: Flach Peter
Kull Meelis
Song Hao
Publication venue
Publication date: 01/01/2018
Field of study

The task of calibration is to retrospectively adjust the outputs from a machine learning model to provide better probability estimates on the target variable. While calibration has been investigated thoroughly in classification, it has not yet been well-established for regression tasks. This paper considers the problem of calibrating a probabilistic regression model to improve the estimated probability densities over the real-valued targets. We propose to calibrate a regression model through the cumulative probability density, which can be derived from calibrating a multi-class classifier. We provide three non-parametric approaches to solve the problem, two of which provide empirical estimates and the third providing smooth density estimates. The proposed approaches are experimentally evaluated to show their ability to improve the performance of regression models on the predictive likelihood

arXiv.org e-Print Archive

Explore Bristol Research

An experimental investigation of calibration techniques for imbalanced data

Author: Chen H.
Huang L.
vanden Broucke Seppe
Zhao J.
Zhu B.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Calibration is a technique used to obtain accurate probability estimation for classification problems in real applications. Class imbalance can create considerable challenges in obtaining accurate probabilities for calibration methods. However, previous research has paid little attention to this issue. In this paper, we present an experimental investigation of some prevailing calibration methods in different imbalance scenarios. Several performance metrics are considered to evaluate different aspects of calibration performance. The experimental results show that the performance of different calibration techniques depends on the metrics and the degree of the imbalance ratio. Isotonic Regression has better overall performance on imbalanced datasets than parametric and other complex non-parametric methods. However, it performs unstably in highly imbalanced scenarios. This study provides some insights into calibration methods on imbalanced datasets, and it can be a reference for the future development of calibration methods in class imbalance scenarios

Ghent University Academic Bibliography

An operational definition of quark and gluon jets

Author: Komiske Patrick T.
Metodiev Eric M.
Thaler Jesse
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2018
Field of study

While "quark" and "gluon" jets are often treated as separate, well-defined objects in both theoretical and experimental contexts, no precise, practical, and hadron-level definition of jet flavor presently exists. To remedy this issue, we develop and advocate for a data-driven, operational definition of quark and gluon jets that is readily applicable at colliders. Rather than specifying a per-jet flavor label, we aggregately define quark and gluon jets at the distribution level in terms of measured hadronic cross sections. Intuitively, quark and gluon jets emerge as the two maximally separable categories within two jet samples in data. Benefiting from recent work on data-driven classifiers and topic modeling for jets, we show that the practical tools needed to implement our definition already exist for experimental applications. As an informative example, we demonstrate the power of our operational definition using Z+jet and dijet samples, illustrating that pure quark and gluon distributions and fractions can be successfully extracted in a fully well-defined manner.Comment: 38 pages, 10 figures, 1 table; v2: updated to match JHEP versio

arXiv.org e-Print Archive

DSpace@MIT

Directory of Open Access Journals

Calibrating predictive model estimates to support personalized medicine

Author: Jiang Xiaoqian
Kim Jihoon
Ohno-Machado Lucila
Osl Melanie
Publication venue: BMJ Group
Publication date
Field of study

Crossref

PubMed Central