393 research outputs found

    Quantile Regression in the Presence of Sample Selection

    Get PDF
    Most sample selection models assume that the errors are independent of the regressors. Under this assumption, all quantile and mean functions are parallel, which implies that quantile estimators cannot reveal any (per definition non-existing) heterogeneity. However, quantile estimators are useful for testing the independence assumption, because they are consistent under the null hypothesis. We propose tests for this crucial restriction that are based on the entire conditional quantile regression process after correcting for sample selection bias. Monte Carlo simulations demonstrate that they are powerful and two empirical illustrations indicate that violations of this assumption are likely to be ubiquitous in labor economics.Sample selection, quantile regression, independence, test

    Semiparametric GEE analysis in partially linear single-index models for longitudinal data

    Get PDF
    In this article, we study a partially linear single-index model for longitudinal data under a general framework which includes both the sparse and dense longitudinal data cases. A semiparametric estimation method based on a combination of the local linear smoothing and generalized estimation equations (GEE) is introduced to estimate the two parameter vectors as well as the unknown link function. Under some mild conditions, we derive the asymptotic properties of the proposed parametric and nonparametric estimators in different scenarios, from which we find that the convergence rates and asymptotic variances of the proposed estimators for sparse longitudinal data would be substantially different from those for dense longitudinal data. We also discuss the estimation of the covariance (or weight) matrices involved in the semiparametric GEE method. Furthermore, we provide some numerical studies including Monte Carlo simulation and an empirical application to illustrate our methodology and theory.Comment: Published at http://dx.doi.org/10.1214/15-AOS1320 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Statistical inference of partially linear regression models with heteroscedastic errors

    Get PDF
    The authors study a heteroscedastic partially linear regression model and develop an inferential procedure for it. This includes a test of heteroscedasticity, a two-step estimator of the heteroscedastic variance function, semiparametric generalized least-squares estimators of the parametric and nonparametric components of the model, and a bootstrap goodness of fit test to see whether the nonparametric component can be parametrized

    Inference on Treatment Effects After Selection Amongst High-Dimensional Controls

    Get PDF
    We propose robust methods for inference on the effect of a treatment variable on a scalar outcome in the presence of very many controls. Our setting is a partially linear model with possibly non-Gaussian and heteroscedastic disturbances. Our analysis allows the number of controls to be much larger than the sample size. To make informative inference feasible, we require the model to be approximately sparse; that is, we require that the effect of confounding factors can be controlled for up to a small approximation error by conditioning on a relatively small number of controls whose identities are unknown. The latter condition makes it possible to estimate the treatment effect by selecting approximately the right set of controls. We develop a novel estimation and uniformly valid inference method for the treatment effect in this setting, called the "post-double-selection" method. Our results apply to Lasso-type methods used for covariate selection as well as to any other model selection method that is able to find a sparse model with good approximation properties. The main attractive feature of our method is that it allows for imperfect selection of the controls and provides confidence intervals that are valid uniformly across a large class of models. In contrast, standard post-model selection estimators fail to provide uniform inference even in simple cases with a small, fixed number of controls. Thus our method resolves the problem of uniform inference after model selection for a large, interesting class of models. We illustrate the use of the developed methods with numerical simulations and an application to the effect of abortion on crime rates

    Semiparametric Quantile Regression and Applications to Healthcare Data Analysis

    Get PDF
    University of Minnesota Ph.D. dissertation. 2018. Major: Statistics. Advisor: Lan Wang. 1 computer file (PDF); 122 pages.The ubiquity of healthcare data allows for complex analyses of a variety of topics ranging from healthcare cost to cognitive decline in dementia patients. Healthcare datasets are often highly skewed and heteroskedastic posing great challenges for statistical analyses. Quantile regression is an effective tool for analyzing healthcare datasets because, compared with mean regression, quantile regression has weaker assumptions which are more appropriate for complex data. Additionally, quantile regression models conditional quantiles of the response variable providing a more complete picture of the conditional distribution. In this dissertation, we propose three solutions to challenges in healthcare data analysis. All three solutions either directly rely on quantile regression or extend existing methodology and algorithms. Motivated by the Medical Expenditure Panel Survey containing data from individuals’ medical providers and employers across the United States, we propose a new semiparametric procedure for predicting whether a patient will incur high medical expenditure. The common practice is to artificially dichotomize the response. We propose a new semiparametric prediction rule to classify whether a future response occurs at the upper tail of the response distribution. The new method can be considered a semiparametric estimator of the Bayes rule for classification and enjoys some nice features. It incorporates nonlinear covariate effects and can be adapted to construct a prediction interval and hence provides more information about the future response. Next, we extend semiparametric quantile regression methodology to longitudinal studies with non-ignorable dropout. Dropout occurs when a patient leaves a study prior to its conclusion. Non-ignorable dropout occurs when the probability of dropout depends on the response. Failing to account for non-ignorable dropout can result in biased estimation. To handle dropout, we propose a weighted semiparametric quantile regression estimator where the weights are inversely proportional to the estimated probability remaining in the study. We show that this weighted estimator gives unbiased estimates of linear effects. We illustrate the advantages of the proposed method on a subset of the National Alzheimer’s Coordinating Center Uniform Data Set tracking cognitive decline in dementia patients. Lastly, we turn our attention to the issue of analyzing very large datasets with a large number of covariates and sample size. Penalized quantile regression is often used to simultaneously select variables and estimate effects by fitting models at many values of a tuning parameter. Existing algorithms have focused on improving computation time at one value of a tuning parameter, however obtaining model estimates for all values of the tuning parameter can still be prohibitively time-consuming. Instead of attempting to solve the penalized quantile regression problem for each value of a tuning parameter, we propose a sparsity path algorithm to approximate the solution allowing for fast exploration of candidate models at many different sparsity levels. Simulations show that the true model is always contained in the set of candidate models returned by the proposed sparsity path algorithm

    [[alternative]]Bayesian Inference of the Nonparametric Stochastic Frontier Models

    Get PDF
    計畫編號:NSC94-2415-H032-006研究期間:200508~200607研究經費:597,000[[abstract]]隨機邊界模型通常被用來衡量一廠商的無效率程度。然而,我們常常發現其對於邊界的函數設定情形非常敏感。所以,即使是誤差分配的設定是正確的,錯誤的「技術」(邊界) 設定將導致錯誤的「無效率」推論。本篇文章因此放寬傳統的「參數」隨機邊界模型而考慮一個「無母數」的隨機邊界模型來避免設定誤差。我們利用貝氏方法中的馬可夫鍊蒙地卡羅方法來估計、分析與推論模型相關係數,而且估計之結果俱有小樣本性質。我們也推導出其所需之完全條件分配,並預期以一實際資料來應用與展現其實用性。[[sponsorship]]行政院國家科學委員

    Conditional Transformation Models

    Full text link
    The ultimate goal of regression analysis is to obtain information about the conditional distribution of a response given a set of explanatory variables. This goal is, however, seldom achieved because most established regression models only estimate the conditional mean as a function of the explanatory variables and assume that higher moments are not affected by the regressors. The underlying reason for such a restriction is the assumption of additivity of signal and noise. We propose to relax this common assumption in the framework of transformation models. The novel class of semiparametric regression models proposed herein allows transformation functions to depend on explanatory variables. These transformation functions are estimated by regularised optimisation of scoring rules for probabilistic forecasts, e.g. the continuous ranked probability score. The corresponding estimated conditional distribution functions are consistent. Conditional transformation models are potentially useful for describing possible heteroscedasticity, comparing spatially varying distributions, identifying extreme events, deriving prediction intervals and selecting variables beyond mean regression effects. An empirical investigation based on a heteroscedastic varying coefficient simulation model demonstrates that semiparametric estimation of conditional distribution functions can be more beneficial than kernel-based non-parametric approaches or parametric generalised additive models for location, scale and shape
    corecore