43 research outputs found
Improving Risk Predictions by Preprocessing Imbalanced Credit Data
Imbalanced credit data sets refer to databases in which the class of defaulters is heavily under-represented in comparison to the class of non-defaulters. This is a very common situation in real-life credit scoring applications, but it has still received little attention. This paper investigates whether data resampling can be used to improve the performance of learners built from imbalanced credit data sets, and whether the effectiveness of resampling is related to the type of classifier. Experimental results demonstrate that learning with the resampled sets consistently outperforms the use of the original imbalanced credit data, independently of the classifier used
Towards Machine Wald
The past century has seen a steady increase in the need of estimating and
predicting complex systems and making (possibly critical) decisions with
limited information. Although computers have made possible the numerical
evaluation of sophisticated statistical models, these models are still designed
\emph{by humans} because there is currently no known recipe or algorithm for
dividing the design of a statistical model into a sequence of arithmetic
operations. Indeed enabling computers to \emph{think} as \emph{humans} have the
ability to do when faced with uncertainty is challenging in several major ways:
(1) Finding optimal statistical models remains to be formulated as a well posed
problem when information on the system of interest is incomplete and comes in
the form of a complex combination of sample data, partial knowledge of
constitutive relations and a limited description of the distribution of input
random variables. (2) The space of admissible scenarios along with the space of
relevant information, assumptions, and/or beliefs, tend to be infinite
dimensional, whereas calculus on a computer is necessarily discrete and finite.
With this purpose, this paper explores the foundations of a rigorous framework
for the scientific computation of optimal statistical estimators/models and
reviews their connections with Decision Theory, Machine Learning, Bayesian
Inference, Stochastic Optimization, Robust Optimization, Optimal Uncertainty
Quantification and Information Based Complexity.Comment: 37 page
Testing normality in econometric models
SIGLELD:9261.96(216) / BLDSC - British Library Document Supply CentreGBUnited Kingdo
Migration, Fixed Costs, and Location-Specific Amenities: A Hazard Analysis for a Panel of Males
This article presents econometric estimates of the adult working-age male hazard function of interstate migration fitted to data obtained from migration decisions of adult males over a twenty-year period. The results show a strong negative effect of the real wage difference between origin and destination, and of fixed costs associated with a move, on the hazard rate of interstate migration. Farmers and other self-employed males, and males who have school-age children, have unusually low hazard rates of interstate migration. Although a high crime rate is shown to increase the real wage, it also has a separate positive effect on the hazard of migration. Copyright 2007, Oxford University Press.