8 research outputs found

    Non-Parametric Calibration of Probabilistic Regression

    Full text link
    The task of calibration is to retrospectively adjust the outputs from a machine learning model to provide better probability estimates on the target variable. While calibration has been investigated thoroughly in classification, it has not yet been well-established for regression tasks. This paper considers the problem of calibrating a probabilistic regression model to improve the estimated probability densities over the real-valued targets. We propose to calibrate a regression model through the cumulative probability density, which can be derived from calibrating a multi-class classifier. We provide three non-parametric approaches to solve the problem, two of which provide empirical estimates and the third providing smooth density estimates. The proposed approaches are experimentally evaluated to show their ability to improve the performance of regression models on the predictive likelihood

    Modeling Auction Price Uncertainty Using Boosting-based Conditional Density Estimation

    No full text
    In complicated, interacting auctions, a fundamental problem is the prediction of prices of goods in the auctions, and more broadly, the modeling of uncertainty regarding these prices. In this paper, we present a machine-learning approach to this problem. The technique is based on a new and general boosting-based algorithm for conditional density estimation problems of this kind

    Techniques in Ordinal Classification and Image-to-Image Translation

    Get PDF
    Dans cette thèse, nous explorons deux thèmes de recherche dans le domaine de l’apprentissage en profondeur et de l’imagerie médicale. La première est dans la classification ordinale, dans laquelle les classes à prévoir sont discrètes mais ont une relation d’ordonnancement. Les distributions de probabilités sous les classes ordinales peuvent posséder des propriétés indésirables, comme la non-unimodalité. Nous proposons une technique simple pour contraindre les distributions de probabilités ordinales discrètes à être unimodales par l’utilisation des distributions de Poisson et des distributions de probabilités binomiales. Nous évaluons cette approche sur la base d’une estimation de l’âge et d’un ensemble de données Kaggle sur la rétinopathie diabétique et obtenons des résultats compétitifs. Nous supposons que la contrainte d’unimodalité – en plus de rendre les distributions de probabilité plus interprétables – agit comme un régularisateur qui peut atténuer le dépassement, surtout dans un régime de données faible. Dans le second thème, nous explorons la traduction d’image à image contradictoire et motivons leur utilité dans le cadre d’un apprentissage semi-supervisé. Nous évaluons une méthode existante et en proposons une nouvelle que nous évaluons sur plusieurs bases de données comme celles utilisées dans notre travail sur la classification ordinale. Dans ce dernier cas, nous voulons établir une correspondance entre le domaine des scanners de patients symptomatiques et celui des scanners de patients non symptomatiques. Cela forme effectivement un modèle qui peut démêler les facteurs de variation sous-jacents et apprendre à détecter et à supprimer les zones symptomatiques de l’image, ce qui pourrait être exploité de plusieurs façons, comme aider un réseau qui s’appuie sur des étiquettes riches, ou générer des exemples synthétiques. Nous présentons des résultats qualitatifs intéressants et motivons plusieurs pistes prometteuses pour l’avenir.----------ABSTRACT: In this thesis we explore two research topics within the realm of deep learning and medical imaging. The first is in ordinal classification, in which the classes to be predicted are discrete but have an ordering relation. Probability distributions under ordinal classes can possess undesired properties, such as non-unimodality. We propose a straightforward technique to constrain discrete ordinal probability distributions to be unimodal via the use of the Poisson and binomial probability distributions. We evaluate this approach on an age estimation and Kaggle diabetic retinopathy dataset and obtain competitive results. We conjecture that the unimodality constraint – in addition to making the probability distributions more interpretable – acts as a regulariser which can mitigate overfitting, especially in a low data regime. In the second topic, we explore adversarial image-to-image translation and motivate their utility within the framework of semi-supervised learning. We evaluate an existing method and propose a new one which we evaluate on several datasets such as the ones employed in our work on ordinal classification. In the case of the latter, we want to map from the domain of symptomatic patient scans to non-symptomatic patient scans. This effectively trains a model which can disentangle the underlying factors of variation and learn to detect and remove symptomatic regions in the image, which could be leveraged in several ways, such as aiding a network which relies on rich labels, or generating synthetic examples. We present some interesting qualitative results and motivate several promising avenues to take for the future

    Probabilistic reframing for cost-sensitive regression

    Full text link
    © ACM, 2014. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Knowledge Discovery from Data (TKDD), VOL. 8, ISS. 4, (October 2014) http://doi.acm.org/10.1145/2641758Common-day applications of predictive models usually involve the full use of the available contextual information. When the operating context changes, one may fine-tune the by-default (incontextual) prediction or may even abstain from predicting a value (a reject). Global reframing solutions, where the same function is applied to adapt the estimated outputs to a new cost context, are possible solutions here. An alternative approach, which has not been studied in a comprehensive way for regression in the knowledge discovery and data mining literature, is the use of a local (e.g., probabilistic) reframing approach, where decisions are made according to the estimated output and a reliability, confidence, or probability estimation. In this article, we advocate for a simple two-parameter (mean and variance) approach, working with a normal conditional probability density. Given the conditional mean produced by any regression technique, we develop lightweight “enrichment” methods that produce good estimates of the conditional variance, which are used by the probabilistic (local) reframing methods. We apply these methods to some very common families of costsensitive problems, such as optimal predictions in (auction) bids, asymmetric loss scenarios, and rejection rules.This work was supported by the MEC/MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, and TIN 2013-45732-C4-1-P and GVA projects PROMETEO/2008/051 and PROMETEO2011/052. Finally, part of this work was motivated by the REFRAME project (http://www.reframe-d2k.org) granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA) and funded by Ministerio de Economia y Competitividad in Spain (PCIN-2013-037).Hernández Orallo, J. (2014). Probabilistic reframing for cost-sensitive regression. ACM Transactions on Knowledge Discovery from Data. 8(4):1-55. https://doi.org/10.1145/2641758S15584G. Bansal, A. Sinha, and H. Zhao. 2008. Tuning data mining methods for cost-sensitive regression: A study in loan charge-off forecasting. Journal of Management Information System 25, 3 (Dec. 2008), 315--336.A. P. Basu and N. Ebrahimi. 1992. Bayesian approach to life testing and reliability estimation using asymmetric loss function. Journal of Statistical Planning and Inference 29, 1--2 (1992), 21--31.A. Bella, C. Ferri, J. Hernández-Orallo, and M. J. Ramírez-Quintana. 2010. Quantification via probability estimators. In Proceedings of the 2010 IEEE International Conference on Data Mining. IEEE, 737--742.A. Bella, C. Ferri, J. Hernández-Orallo, and M. J. Ramírez-Quintana. 2013. Aggregative quantification for regression. Data Mining and Knowledge Discovery (2013), 1--44.A. Bella, C. Ferri, J. Hernández-Orallo, and M. J. Ramírez-Quintana. 2009. Calibration of machine learning models. In Handbook of Research on Machine Learning Applications. IGI Global, 128--146.A. Bella, C. Ferri, J. Hernández-Orallo, and M. J. Ramírez-Quintana. 2011. Using negotiable features for prescription problems. Computing 91, 2 (2011), 135--168.J. Bi and K. P. Bennett. 2003. Regression error characteristic curves. In Proceedings of the 20th International Conference on Machine Learning (ICML’03).Z. Bosnić and I. Kononenko. 2008. Comparison of approaches for estimating reliability of individual regression predictions. Data & Knowledge Engineering 67, 3 (2008), 504--516.Z. Bosnić and I. Kononenko. 2009. An overview of advances in reliability estimation of individual predictions in machine learning. Intelligent Data Analysis 13, 2 (2009), 385--401.L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth.P. F. Christoffersen and F. X. Diebold. 1996. Further results on forecasting and model selection under asymmetric loss. Journal of Applied Econometrics 11, 5 (1996), 561--571.P. F. Christoffersen and F. X. Diebold. 1997. Optimal prediction under asymmetric loss. Econometric Theory 13 (1997), 808--817.I. Cohen and M. Goldszmidt. 2004. Properties and benefits of calibrated classifiers. Knowledge Discovery in Databases: PKDD 2004 (2004), 125--136.S. Crone. 2002. Training artificial neural networks for time series prediction using asymmetric cost functions. In Proceedings of the 9th International Conference on Neural Information Processing.J. Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7 (2006), 1--30.M. Dumas, L. Aldred, G. Governatori, and A. H. M. Ter Hofstede. 2005. Probabilistic automated bidding in multiple auctions. Electronic Commerce Research 5, 1 (2005), 25--49.C. Elkan. 2001. The foundations of cost-sensitive learning. In Proceedings of the 17th International Conference on Artificial Intelligence (’01), Bernhard Nebel (Ed.). San Francisco, CA, 973--978.G. Elliott and A. Timmermann. 2004. Optimal forecast combinations under general loss functions and forecast error distributions. Journal of Econometrics 122, 1 (2004), 47--79.T. Fawcett. 2006a. An introduction to ROC analysis. Pattern Recognition Letters 27, 8 (2006), 861--874.T. Fawcett. 2006b. ROC graphs with instance-varying costs. Pattern Recognition Letters 27, 8 (2006), 882--891.C. Ferri, P. Flach, and J. Hernández-Orallo. 2002. Learning decision trees using the area under the ROC curve. In Proceedings of the International Conference on Machine Learning. 139--146.C. Ferri, P. Flach, and J. Hernández-Orallo. 2003. Improving the AUC of probabilistic estimation trees. In Proceedings of the 14th European Conference on Machine Learning (ECML’03). Springer, 121--132.C. Ferri and J. Hernández-Orallo. 2004. Cautious classifiers. In ROC Analysis in Artificial Intelligence, 1st International Workshop, ROCAI-2004, Valencia, Spain, August 22, 2004, J. Hernández-Orallo, C. Ferri, N. Lachiche, and P. A. Flach (Eds.). 27--36.P. Flach. 2012. Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press.G. Forman. 2008. Quantifying counts and costs via classification. Data Mining and Knowledge Discovery 17, 2 (2008), 164--206.S. García and F. Herrera. 2008. An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. The Journal of Machine Learning Research 9, 2677--2694 (2008), 66.R. Ghani. 2005. Price prediction and insurance for online auctions. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD’05). ACM, New York, NY, 411--418.C. W. J. Granger. 1969. Prediction with a generalized cost of error function. Operational Research (1969), 199--207.C. W. J. Granger. 1999. Outline of forecast theory using generalized cost functions. Spanish Economic Review 1, 2 (1999), 161--173.P. Hall, J. Racine, and Q. Li. 2004. Cross-validation and the estimation of conditional probability densities. Journal of the American Statistical Association 99, 468 (2004), 1015--1026.P. Hall, R. C. L. Wolff, and Q. Yao. 1999. Methods for estimating a conditional distribution function. Journal of the American Statistical Association (1999), 154--163.T. J. Hastie, R. J. Tibshirani, and J. H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.J. Hernández-Orallo. 2013. ROC curves for regression. Pattern Recognition 46, 12 (2013), 3395--3411.J. Hernández-Orallo, P. Flach, and C. Ferri. 2012. A unified view of performance metrics: Translating threshold choice into expected classification loss. Journal of Machine Learning Research 13 (2012), 2813--2869.J. Hernández-Orallo, P. Flach, and C. Ferri. 2013. ROC curves in cost space. Machine Learning 93, 1 (2013), 71--91.J. N. Hwang, S. R. Lay, and A. Lippman. 1994. Nonparametric multivariate density estimation: A comparative study. IEEE Transactions on Signal Processing 42, 10 (1994), 2795--2810.R. J. Hyndman, D. M. Bashtannyk, and G. K. Grunwald. 1996. Estimating and visualizing conditional densities. Journal of Computational and Graphical Statistics (1996), 315--336.N. Japkowicz and M. Shah. 2011. Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press.M. Jino, B. T. de Abreu, and others. 2010. Machine learning methods and asymmetric cost function to estimate execution effort of software testing. In Proceedings of the 2010 3rd International Conference on Software Testing, Verification and Validation (ICST’10). IEEE, 275--284.B. Kitts and B. Leblanc. 2004. Optimal bidding on keyword auctions. Electronic Markets 14, 3 (2004), 186--201.N. Lachiche and P. Flach. 2003. Improving accuracy and cost of two-class and multi-class probabilistic classifiers using ROC curves. In Proceedings of the International Conference on Machine Learning, Vol. 20-1. 416.H. Papadopoulos. 2008. Inductive conformal prediction: Theory and application to neural networks. Tools in Artificial Intelligence 18 (2008), 315--330.H. Papadopoulos, K. Proedrou, V. Vovk, and A. Gammerman. 2002. Inductive confidence machines for regression. In Machine Learning: ECML 2002, Tapio Elomaa, Heikki Mannila, and Hannu Toivonen (Eds.). Lecture Notes in Computer Science, Vol. 2430. Springer, Berlin, 185--194.H. Papadopoulos, V. Vovk, and A. Gammerman. 2011. Regression conformal prediction with nearest neighbours. Journal of Artificial Intelligence Research 40, 1 (2011), 815--840.T. Pietraszek. 2007. On the use of ROC analysis for the optimization of abstaining classifiers. Machine Learning 68, 2 (2007), 137--169.J. C. Platt. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers. MIT Press, Boston, 61--74.F. Provost and P. Domingos. 2003. Tree induction for probability-based ranking. Machine Learning 52, 3 (2003), 199--215.R Team and others. 2012. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.R. Ribeiro. 2011. Utility-based Regression. PhD thesis, Department of Computer Science, Faculty of Sciences, University of Porto.M. Rosenblatt. 1969. Conditional probability density and regression estimators. Multivariate Analysis II 25 (1969), 31.S. Rosset, C. Perlich, and B. Zadrozny. 2007. Ranking-based evaluation of regression models. Knowledge and Information Systems 12, 3 (2007), 331--353.R. E. Schapire, P. Stone, D. McAllester, M. L. Littman, and J. A. Csirik. 2002. Modeling auction price uncertainty using boosting-based conditional density estimation. In Proceedings of the International Conference on Machine Learning. 546--553.G. Shafer and V. Vovk. 2008. A tutorial on conformal prediction. Journal of Machine Learning Research 9 (2008), 371--421.J. A. Swets, R. M. Dawes, and J. Monahan. 2000. Better decisions through science. Scientific American 283, 4 (Oct. 2000), 82--87.R. D. Thompson and A. P. Basu. 1996. Asymmetric loss functions for estimating system reliability. In Bayesian Analysis in Statistics and Econometrics. John Wiley & Sons, 471--482.L. Torgo. 2005. Regression error characteristic surfaces. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, 697--702.L. Torgo. 2010. Data Mining with R. Chapman and Hall/CRC Press.L. Torgo and R. Ribeiro. 2007. Utility-based regression. Knowledge Discovery in Databases: PKDD 2007. 597--604.L. Torgo and R. Ribeiro. 2009. Precision and recall for regression. In Discovery Science. Springer, 332--346.P. Turney. 2000. Types of cost in inductive concept learning. Canada National Research Council Publications Archive.L. Wasserman. 2006. All of Nonparametric Statistics. Springer-Verlag, New York.M. P. Wellman, D. M. Reeves, K. M. Lochner, and Y. Vorobeychik. 2004. Price prediction in a trading agent competition. Journal of Artificial Intelligence Research 21 (2004), 19--36.K. Yu and M. C. Jones. 2004. Likelihood-based local linear estimation of the conditional variance function. Journal of the American Statistical Association 99, 465 (2004), 139--144.B. Zadrozny and C. Elkan. 2002. Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 694--699.A. Zellner. 1986. Bayesian estimation and prediction using asymmetric loss functions. Journal of the American Statistical Association (1986), 446--451.H. Zhao, A. P. Sinha, and G. Bansal. 2011. An extended tuning method for cost-sensitive regression and forecasting. Decision Support Systems
    corecore