43 research outputs found

    Improving Risk Predictions by Preprocessing Imbalanced Credit Data

    Get PDF
    Imbalanced credit data sets refer to databases in which the class of defaulters is heavily under-represented in comparison to the class of non-defaulters. This is a very common situation in real-life credit scoring applications, but it has still received little attention. This paper investigates whether data resampling can be used to improve the performance of learners built from imbalanced credit data sets, and whether the effectiveness of resampling is related to the type of classifier. Experimental results demonstrate that learning with the resampled sets consistently outperforms the use of the original imbalanced credit data, independently of the classifier used

    Towards Machine Wald

    Get PDF
    The past century has seen a steady increase in the need of estimating and predicting complex systems and making (possibly critical) decisions with limited information. Although computers have made possible the numerical evaluation of sophisticated statistical models, these models are still designed \emph{by humans} because there is currently no known recipe or algorithm for dividing the design of a statistical model into a sequence of arithmetic operations. Indeed enabling computers to \emph{think} as \emph{humans} have the ability to do when faced with uncertainty is challenging in several major ways: (1) Finding optimal statistical models remains to be formulated as a well posed problem when information on the system of interest is incomplete and comes in the form of a complex combination of sample data, partial knowledge of constitutive relations and a limited description of the distribution of input random variables. (2) The space of admissible scenarios along with the space of relevant information, assumptions, and/or beliefs, tend to be infinite dimensional, whereas calculus on a computer is necessarily discrete and finite. With this purpose, this paper explores the foundations of a rigorous framework for the scientific computation of optimal statistical estimators/models and reviews their connections with Decision Theory, Machine Learning, Bayesian Inference, Stochastic Optimization, Robust Optimization, Optimal Uncertainty Quantification and Information Based Complexity.Comment: 37 page

    Testing normality in econometric models

    No full text
    SIGLELD:9261.96(216) / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    Migration, Fixed Costs, and Location-Specific Amenities: A Hazard Analysis for a Panel of Males

    Get PDF
    This article presents econometric estimates of the adult working-age male hazard function of interstate migration fitted to data obtained from migration decisions of adult males over a twenty-year period. The results show a strong negative effect of the real wage difference between origin and destination, and of fixed costs associated with a move, on the hazard rate of interstate migration. Farmers and other self-employed males, and males who have school-age children, have unusually low hazard rates of interstate migration. Although a high crime rate is shown to increase the real wage, it also has a separate positive effect on the hazard of migration. Copyright 2007, Oxford University Press.
    corecore