1,339 research outputs found

    Least Ambiguous Set-Valued Classifiers with Bounded Error Levels

    Full text link
    In most classification tasks there are observations that are ambiguous and therefore difficult to correctly label. Set-valued classifiers output sets of plausible labels rather than a single label, thereby giving a more appropriate and informative treatment to the labeling of ambiguous instances. We introduce a framework for multiclass set-valued classification, where the classifiers guarantee user-defined levels of coverage or confidence (the probability that the true label is contained in the set) while minimizing the ambiguity (the expected size of the output). We first derive oracle classifiers assuming the true distribution to be known. We show that the oracle classifiers are obtained from level sets of the functions that define the conditional probability of each class. Then we develop estimators with good asymptotic and finite sample properties. The proposed estimators build on existing single-label classifiers. The optimal classifier can sometimes output the empty set, but we provide two solutions to fix this issue that are suitable for various practical needs.Comment: Final version to be published in the Journal of the American Statistical Association at https://www.tandfonline.com/doi/abs/10.1080/01621459.2017.1395341?journalCode=uasa2

    Consistency of plug-in confidence sets for classification in semi-supervised learning

    Full text link
    Confident prediction is highly relevant in machine learning; for example, in applications such as medical diagnoses, wrong prediction can be fatal. For classification, there already exist procedures that allow to not classify data when the confidence in their prediction is weak. This approach is known as classification with reject option. In the present paper, we provide new methodology for this approach. Predicting a new instance via a confidence set, we ensure an exact control of the probability of classification. Moreover, we show that this methodology is easily implementable and entails attractive theoretical and numerical properties

    Uncertainty-aware predictive modeling for fair data-driven decisions

    Full text link
    Both industry and academia have made considerable progress in developing trustworthy and responsible machine learning (ML) systems. While critical concepts like fairness and explainability are often addressed, the safety of systems is typically not sufficiently taken into account. By viewing data-driven decision systems as socio-technical systems, we draw on the uncertainty in ML literature to show how fairML systems can also be safeML systems. We posit that a fair model needs to be an uncertainty-aware model, e.g. by drawing on distributional regression. For fair decisions, we argue that a safe fail option should be used for individuals with uncertain categorization. We introduce semi-structured deep distributional regression as a modeling framework which addresses multiple concerns brought against standard ML models and show its use in a real-world example of algorithmic profiling of job seekers

    Model Agnostic Explainable Selective Regression via Uncertainty Estimation

    Full text link
    With the wide adoption of machine learning techniques, requirements have evolved beyond sheer high performance, often requiring models to be trustworthy. A common approach to increase the trustworthiness of such systems is to allow them to refrain from predicting. Such a framework is known as selective prediction. While selective prediction for classification tasks has been widely analyzed, the problem of selective regression is understudied. This paper presents a novel approach to selective regression that utilizes model-agnostic non-parametric uncertainty estimation. Our proposed framework showcases superior performance compared to state-of-the-art selective regressors, as demonstrated through comprehensive benchmarking on 69 datasets. Finally, we use explainable AI techniques to gain an understanding of the drivers behind selective regression. We implement our selective regression method in the open-source Python package doubt and release the code used to reproduce our experiments

    Probabilistic reframing for cost-sensitive regression

    Full text link
    © ACM, 2014. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Knowledge Discovery from Data (TKDD), VOL. 8, ISS. 4, (October 2014) http://doi.acm.org/10.1145/2641758Common-day applications of predictive models usually involve the full use of the available contextual information. When the operating context changes, one may fine-tune the by-default (incontextual) prediction or may even abstain from predicting a value (a reject). Global reframing solutions, where the same function is applied to adapt the estimated outputs to a new cost context, are possible solutions here. An alternative approach, which has not been studied in a comprehensive way for regression in the knowledge discovery and data mining literature, is the use of a local (e.g., probabilistic) reframing approach, where decisions are made according to the estimated output and a reliability, confidence, or probability estimation. In this article, we advocate for a simple two-parameter (mean and variance) approach, working with a normal conditional probability density. Given the conditional mean produced by any regression technique, we develop lightweight “enrichment” methods that produce good estimates of the conditional variance, which are used by the probabilistic (local) reframing methods. We apply these methods to some very common families of costsensitive problems, such as optimal predictions in (auction) bids, asymmetric loss scenarios, and rejection rules.This work was supported by the MEC/MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, and TIN 2013-45732-C4-1-P and GVA projects PROMETEO/2008/051 and PROMETEO2011/052. Finally, part of this work was motivated by the REFRAME project (http://www.reframe-d2k.org) granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA) and funded by Ministerio de Economia y Competitividad in Spain (PCIN-2013-037).Hernández Orallo, J. (2014). Probabilistic reframing for cost-sensitive regression. ACM Transactions on Knowledge Discovery from Data. 8(4):1-55. https://doi.org/10.1145/2641758S15584G. Bansal, A. Sinha, and H. Zhao. 2008. Tuning data mining methods for cost-sensitive regression: A study in loan charge-off forecasting. Journal of Management Information System 25, 3 (Dec. 2008), 315--336.A. P. Basu and N. Ebrahimi. 1992. Bayesian approach to life testing and reliability estimation using asymmetric loss function. Journal of Statistical Planning and Inference 29, 1--2 (1992), 21--31.A. Bella, C. Ferri, J. Hernández-Orallo, and M. J. Ramírez-Quintana. 2010. Quantification via probability estimators. In Proceedings of the 2010 IEEE International Conference on Data Mining. IEEE, 737--742.A. Bella, C. Ferri, J. Hernández-Orallo, and M. J. Ramírez-Quintana. 2013. Aggregative quantification for regression. Data Mining and Knowledge Discovery (2013), 1--44.A. Bella, C. Ferri, J. Hernández-Orallo, and M. J. Ramírez-Quintana. 2009. Calibration of machine learning models. In Handbook of Research on Machine Learning Applications. IGI Global, 128--146.A. Bella, C. Ferri, J. Hernández-Orallo, and M. J. Ramírez-Quintana. 2011. Using negotiable features for prescription problems. Computing 91, 2 (2011), 135--168.J. Bi and K. P. Bennett. 2003. Regression error characteristic curves. In Proceedings of the 20th International Conference on Machine Learning (ICML’03).Z. Bosnić and I. Kononenko. 2008. Comparison of approaches for estimating reliability of individual regression predictions. Data & Knowledge Engineering 67, 3 (2008), 504--516.Z. Bosnić and I. Kononenko. 2009. An overview of advances in reliability estimation of individual predictions in machine learning. Intelligent Data Analysis 13, 2 (2009), 385--401.L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth.P. F. Christoffersen and F. X. Diebold. 1996. Further results on forecasting and model selection under asymmetric loss. Journal of Applied Econometrics 11, 5 (1996), 561--571.P. F. Christoffersen and F. X. Diebold. 1997. Optimal prediction under asymmetric loss. Econometric Theory 13 (1997), 808--817.I. Cohen and M. Goldszmidt. 2004. Properties and benefits of calibrated classifiers. Knowledge Discovery in Databases: PKDD 2004 (2004), 125--136.S. Crone. 2002. Training artificial neural networks for time series prediction using asymmetric cost functions. In Proceedings of the 9th International Conference on Neural Information Processing.J. Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7 (2006), 1--30.M. Dumas, L. Aldred, G. Governatori, and A. H. M. Ter Hofstede. 2005. Probabilistic automated bidding in multiple auctions. Electronic Commerce Research 5, 1 (2005), 25--49.C. Elkan. 2001. The foundations of cost-sensitive learning. In Proceedings of the 17th International Conference on Artificial Intelligence (’01), Bernhard Nebel (Ed.). San Francisco, CA, 973--978.G. Elliott and A. Timmermann. 2004. Optimal forecast combinations under general loss functions and forecast error distributions. Journal of Econometrics 122, 1 (2004), 47--79.T. Fawcett. 2006a. An introduction to ROC analysis. Pattern Recognition Letters 27, 8 (2006), 861--874.T. Fawcett. 2006b. ROC graphs with instance-varying costs. Pattern Recognition Letters 27, 8 (2006), 882--891.C. Ferri, P. Flach, and J. Hernández-Orallo. 2002. Learning decision trees using the area under the ROC curve. In Proceedings of the International Conference on Machine Learning. 139--146.C. Ferri, P. Flach, and J. Hernández-Orallo. 2003. Improving the AUC of probabilistic estimation trees. In Proceedings of the 14th European Conference on Machine Learning (ECML’03). Springer, 121--132.C. Ferri and J. Hernández-Orallo. 2004. Cautious classifiers. In ROC Analysis in Artificial Intelligence, 1st International Workshop, ROCAI-2004, Valencia, Spain, August 22, 2004, J. Hernández-Orallo, C. Ferri, N. Lachiche, and P. A. Flach (Eds.). 27--36.P. Flach. 2012. Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press.G. Forman. 2008. Quantifying counts and costs via classification. Data Mining and Knowledge Discovery 17, 2 (2008), 164--206.S. García and F. Herrera. 2008. An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. The Journal of Machine Learning Research 9, 2677--2694 (2008), 66.R. Ghani. 2005. Price prediction and insurance for online auctions. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD’05). ACM, New York, NY, 411--418.C. W. J. Granger. 1969. Prediction with a generalized cost of error function. Operational Research (1969), 199--207.C. W. J. Granger. 1999. Outline of forecast theory using generalized cost functions. Spanish Economic Review 1, 2 (1999), 161--173.P. Hall, J. Racine, and Q. Li. 2004. Cross-validation and the estimation of conditional probability densities. Journal of the American Statistical Association 99, 468 (2004), 1015--1026.P. Hall, R. C. L. Wolff, and Q. Yao. 1999. Methods for estimating a conditional distribution function. Journal of the American Statistical Association (1999), 154--163.T. J. Hastie, R. J. Tibshirani, and J. H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.J. Hernández-Orallo. 2013. ROC curves for regression. Pattern Recognition 46, 12 (2013), 3395--3411.J. Hernández-Orallo, P. Flach, and C. Ferri. 2012. A unified view of performance metrics: Translating threshold choice into expected classification loss. Journal of Machine Learning Research 13 (2012), 2813--2869.J. Hernández-Orallo, P. Flach, and C. Ferri. 2013. ROC curves in cost space. Machine Learning 93, 1 (2013), 71--91.J. N. Hwang, S. R. Lay, and A. Lippman. 1994. Nonparametric multivariate density estimation: A comparative study. IEEE Transactions on Signal Processing 42, 10 (1994), 2795--2810.R. J. Hyndman, D. M. Bashtannyk, and G. K. Grunwald. 1996. Estimating and visualizing conditional densities. Journal of Computational and Graphical Statistics (1996), 315--336.N. Japkowicz and M. Shah. 2011. Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press.M. Jino, B. T. de Abreu, and others. 2010. Machine learning methods and asymmetric cost function to estimate execution effort of software testing. In Proceedings of the 2010 3rd International Conference on Software Testing, Verification and Validation (ICST’10). IEEE, 275--284.B. Kitts and B. Leblanc. 2004. Optimal bidding on keyword auctions. Electronic Markets 14, 3 (2004), 186--201.N. Lachiche and P. Flach. 2003. Improving accuracy and cost of two-class and multi-class probabilistic classifiers using ROC curves. In Proceedings of the International Conference on Machine Learning, Vol. 20-1. 416.H. Papadopoulos. 2008. Inductive conformal prediction: Theory and application to neural networks. Tools in Artificial Intelligence 18 (2008), 315--330.H. Papadopoulos, K. Proedrou, V. Vovk, and A. Gammerman. 2002. Inductive confidence machines for regression. In Machine Learning: ECML 2002, Tapio Elomaa, Heikki Mannila, and Hannu Toivonen (Eds.). Lecture Notes in Computer Science, Vol. 2430. Springer, Berlin, 185--194.H. Papadopoulos, V. Vovk, and A. Gammerman. 2011. Regression conformal prediction with nearest neighbours. Journal of Artificial Intelligence Research 40, 1 (2011), 815--840.T. Pietraszek. 2007. On the use of ROC analysis for the optimization of abstaining classifiers. Machine Learning 68, 2 (2007), 137--169.J. C. Platt. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers. MIT Press, Boston, 61--74.F. Provost and P. Domingos. 2003. Tree induction for probability-based ranking. Machine Learning 52, 3 (2003), 199--215.R Team and others. 2012. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.R. Ribeiro. 2011. Utility-based Regression. PhD thesis, Department of Computer Science, Faculty of Sciences, University of Porto.M. Rosenblatt. 1969. Conditional probability density and regression estimators. Multivariate Analysis II 25 (1969), 31.S. Rosset, C. Perlich, and B. Zadrozny. 2007. Ranking-based evaluation of regression models. Knowledge and Information Systems 12, 3 (2007), 331--353.R. E. Schapire, P. Stone, D. McAllester, M. L. Littman, and J. A. Csirik. 2002. Modeling auction price uncertainty using boosting-based conditional density estimation. In Proceedings of the International Conference on Machine Learning. 546--553.G. Shafer and V. Vovk. 2008. A tutorial on conformal prediction. Journal of Machine Learning Research 9 (2008), 371--421.J. A. Swets, R. M. Dawes, and J. Monahan. 2000. Better decisions through science. Scientific American 283, 4 (Oct. 2000), 82--87.R. D. Thompson and A. P. Basu. 1996. Asymmetric loss functions for estimating system reliability. In Bayesian Analysis in Statistics and Econometrics. John Wiley & Sons, 471--482.L. Torgo. 2005. Regression error characteristic surfaces. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, 697--702.L. Torgo. 2010. Data Mining with R. Chapman and Hall/CRC Press.L. Torgo and R. Ribeiro. 2007. Utility-based regression. Knowledge Discovery in Databases: PKDD 2007. 597--604.L. Torgo and R. Ribeiro. 2009. Precision and recall for regression. In Discovery Science. Springer, 332--346.P. Turney. 2000. Types of cost in inductive concept learning. Canada National Research Council Publications Archive.L. Wasserman. 2006. All of Nonparametric Statistics. Springer-Verlag, New York.M. P. Wellman, D. M. Reeves, K. M. Lochner, and Y. Vorobeychik. 2004. Price prediction in a trading agent competition. Journal of Artificial Intelligence Research 21 (2004), 19--36.K. Yu and M. C. Jones. 2004. Likelihood-based local linear estimation of the conditional variance function. Journal of the American Statistical Association 99, 465 (2004), 139--144.B. Zadrozny and C. Elkan. 2002. Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 694--699.A. Zellner. 1986. Bayesian estimation and prediction using asymmetric loss functions. Journal of the American Statistical Association (1986), 446--451.H. Zhao, A. P. Sinha, and G. Bansal. 2011. An extended tuning method for cost-sensitive regression and forecasting. Decision Support Systems
    • …
    corecore