    Setting decision thresholds when operating conditions are uncertain

    [EN] The quality of the decisions made by a machine learning model depends on the data and the operating conditions during deployment. Often, operating conditions such as class distribution and misclassification costs have changed during the time since the model was trained and evaluated. When deploying a binary classifier that outputs scores, once we know the new class distribution and the new cost ratio between false positives and false negatives, there are several methods in the literature to help us choose an appropriate threshold for the classifier's scores. However, on many occasions, the information that we have about this operating condition is uncertain. Previous work has considered ranges or distributions of operating conditions during deployment, with expected costs being calculated for ranges or intervals, but still the decision for each point is made as if the operating condition were certain. The implications of this assumption have received limited attention: a threshold choice that is best suited without uncertainty may be suboptimal under uncertainty. In this paper we analyse the effect of operating condition uncertainty on the expected loss for different threshold choice methods, both theoretically and experimentally. We model uncertainty as a second conditional distribution over the actual operation condition and study it theoretically in such a way that minimum and maximum uncertainty are both seen as special cases of this general formulation. This is complemented by a thorough experimental analysis investigating how different learning algorithms behave for a range of datasets according to the threshold choice method and the uncertainty level.We thank the anonymous reviewers for their comments, which have helped to improve this paper significantly. This work has been partially supported by the EU (FEDER) and the Spanish MINECO under Grant TIN 2015-69175-C4-1-R and by Generalitat Valenciana under Grant PROMETEOII/2015/013. Jose Hernandez-Orallo was supported by a Salvador de Madariaga Grant (PRX17/00467) from the Spanish MECD for a research stay at the Leverhulme Centre for the Future of Intelligence (CFI), Cambridge, a BEST Grant (BEST/2017/045) from Generalitat Valenciana for another research stay also at the CFI and an FLI Grant RFP2-152.Ferri Ramírez, C.; Hernández-Orallo, J.; Flach, P. (2019). Setting decision thresholds when operating conditions are uncertain. Data Mining and Knowledge Discovery. 33(4):805-847. https://doi.org/10.1007/s10618-019-00613-7S805847334Adams N, Hand D (1999) Comparing classifiers when the misallocation costs are uncertain. Pattern Recognit 32(7):1139–1147Bella A, Ferri C, Hernández-Orallo J, Ramírez-Quintana MJ (2013) On the effect of calibration in classifier combination.     Advances in knowledge discovery and data mining Part II

    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II