756,942 research outputs found

    Model selection and error estimation

    Get PDF
    We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical {\sc vc} dimension, empirical {\sc vc} entropy, and margin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.Complexity regularization, model selection, error estimation, concentration of measure

    Testing the normality assumption in the sample selection model with an application to travel demand

    Get PDF
    In this paper we introduce a test for the normality assumption in the sample selection model.The test is based on a generalization of a semi-nonparametric maximum likelihood method.In this estimation method,the distribution of the error erms is approximated by a Hermite series,with normality as a special case.Because all parameters of the model are estimated both under normality and in the more general specification,we can est for normality using the likeli- hood ratio approach.This est has reasonable power as is shown by a simulation study.Finally,we apply the generalized semi-nonparametric maximum likeli- hood estimation method and the normality est o a model of car ownership and car use.The assumption of normal distributed error erms is rejected and we provide estimates of the sample selection model that are consisten .