81 research outputs found

    Resistant Nonparametric Smoothing with S-PLUS

    Get PDF
    In this paper we introduce and illustrate the use of an S-PLUS set of functions to fit M-type smoothing splines with the smoothing parameter chosen by a robust criterion (either a robust version of cross-validation or a robust version of Mallows's Cp ). The main reference is: Cantoni, E. and Ronchetti, E. (2001). Resistant selection of the smoothing parameter for smoothing splines. Statistics and Computing, 11, 141-146.

    A robust approach for skewed and heavy-tailed outcomes in the analysis of health care expenditures

    Get PDF
    In this paper robust statistical procedures are presented for the analysis of skewed and heavy-tailed outcomes as they typically occur in health care data. The new estimators and test statistics are extensions of classical maximum likelihood techniques for generalized linear models. In contrast to their classical counterparts, the new robust techniques show lower variability and excellent effciency properties in the presence of small deviations form the assumed model, i.e. when the underlying distribution of the data lies in a neighborhood of the model. A simulation study, an analysis on real data, and a sensitivity analysis confirm the good theoretical statistical properties of the new techniques.Deviations from the model; GLM modeling; health econometrics; heavy tails; robust estimation; robust inference

    Non-parametric adjustment for covariates when estimating a treatment effect

    Get PDF
    We consider a non-parametric model for estimating the effect of a binary treatment on an outcome variable while adjusting for an observed covariate. A naive procedure consists in performing two separate non-parametric regression of the response on the covariate: one with the treated individuals and the other with the untreated. The treatment effect is then obtained by taking the difference between the two fitted regression functions. This paper proposes a backfitting algorithm which uses all the data for the two above-mentioned non-parametric regression. We give theoretical results showing that the resulting estimator of the treatment effect can have lower finite sample variance. This improvement may be achieved at the cost of a larger bias. However, in a simulation study we observe that mean squared error is lowest for the proposed backfitting estimator. When more than one covariate is observed our backfitting estimator can still be applied by using the propensity score (probability of being treated for a given setup of the covariates). We illustrate the use of the backfitting estimator in a several covariate situation with data on a training program for individuals having faced social and economic problems.Analysis of covariance; backfitting algorithm; linear smoothers; propensity score

    Variable Selection in Additive Models by Nonnegative Garrote

    Get PDF
    We adapt Breiman's (1995) nonnegative garrote method to perform variable selection in nonparametric additive models. The technique avoids methods of testing for which no reliable distributional theory is available. In addition it removes the need for a full search of all possible models, something which is computationally intensive, especially when the number of variables is moderate to high. The method has the advantages of being conceptually simple and computationally fast. It provides accurate predictions and is effective at identifying the variables generating the model. For illustration, we consider both a study of Boston housing prices as well as two simulation settings. In all cases our methods perform as well or better than available alternatives like the Component Selection and Smoothing Operator (COSSO).cross-validation, nonnegative garrote, nonparametric regression, shrinkage methods, variable selection

    Predicting House Prices with Spatial Dependence: A Comparison of Alternative Methods

    Get PDF
    This paper compares alternative methods for taking spatial dependence into account in house price prediction. We select hedonic methods that have been reported in the literature to perform relatively well in terms of ex-sample prediction accuracy. Because differences in performance may be due to differences in data, we compare the methods using a single data set. The estimation methods include simple OLS, a two-stage process incorporating nearest neighbors’ residuals in the second stage, geostatistical, and trend surface models. These models take into account submarkets by adding dummy variables or by estimating separate equations for each submarket. Based on data for approximately 13,000 transactions from Louisville, Kentucky, we conclude that a geostatistical model with disaggregated submarket variables performs best.

    A robust approach for skewed and heavy-tailed outcomes in the analysis of health care expenditures

    Get PDF
    In this paper robust statistical procedures are presented for the analysis of skewed and heavy-tailed outcomes as they typically occur in health care data. The new estimators and test statistics are extensions of classical maximum likelihood techniques for generalized linear models. In contrast to their classical counterparts, the new robust techniques show lower variability and excellent efficiency properties in the presence of small deviations from the assumed model, i.e. when the underlying distribution of the data lies in a neighborhood of the model. A simulation study, an analysis on real data, and a sensitivity analysis confirm the good theoretical statistical properties of the new techniques

    Extracting Long-Term Patterns of Population Changes from Sporadic Counts of Migrant Birds

    Get PDF
    Declines of many North American birds are of conservation concern. Monitoring their population changes has largely depended on formally structured Breeding Bird Surveys, and Migration Monitoring Stations, although some use has been made of lists by birders. For almost 40 years, birders have kept daily counts of migrant landbirds during visits to Seal Island, of Nova Scotia's south tip. Here we present results for several common migrants using day-counts made between August 15 and November 15. Most existing analyses have used linear models to extract trends and other variables from such long-term data sets. Instead we applied Generalized Additive Models (GAMs) to extract the continuous trend functions and patterns of influence of observer number, wind speed, wind direction on count nights and prior nights, and moon phase. The results suggest that GAMs are a powerful way of dealing with such "noisy" data of the sort collected by birders in their recreational pursuits. In addition, it is possible to analyse groups of species (related taxonomically or ecologically) simultaneously with the potential of determining overall more general trends.Seal Island, Generalized additive models, Count data, Overdisperson
    corecore