393 research outputs found
Quantile Regression in the Presence of Sample Selection
Most sample selection models assume that the errors are independent of the regressors. Under this assumption, all quantile and mean functions are parallel, which implies that quantile estimators cannot reveal any (per definition non-existing) heterogeneity. However, quantile estimators are useful for testing the independence assumption, because they are consistent under the null hypothesis. We propose tests for this crucial restriction that are based on the entire conditional quantile regression process after correcting for sample selection bias. Monte Carlo simulations demonstrate that they are powerful and two empirical illustrations indicate that violations of this assumption are likely to be ubiquitous in labor economics.Sample selection, quantile regression, independence, test
Semiparametric GEE analysis in partially linear single-index models for longitudinal data
In this article, we study a partially linear single-index model for
longitudinal data under a general framework which includes both the sparse and
dense longitudinal data cases. A semiparametric estimation method based on a
combination of the local linear smoothing and generalized estimation equations
(GEE) is introduced to estimate the two parameter vectors as well as the
unknown link function. Under some mild conditions, we derive the asymptotic
properties of the proposed parametric and nonparametric estimators in different
scenarios, from which we find that the convergence rates and asymptotic
variances of the proposed estimators for sparse longitudinal data would be
substantially different from those for dense longitudinal data. We also discuss
the estimation of the covariance (or weight) matrices involved in the
semiparametric GEE method. Furthermore, we provide some numerical studies
including Monte Carlo simulation and an empirical application to illustrate our
methodology and theory.Comment: Published at http://dx.doi.org/10.1214/15-AOS1320 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Statistical inference of partially linear regression models with heteroscedastic errors
The authors study a heteroscedastic partially linear regression model and develop an inferential procedure for it. This includes a test of heteroscedasticity, a two-step estimator of the heteroscedastic variance function, semiparametric generalized least-squares estimators of the parametric and nonparametric components of the model, and a bootstrap goodness of fit test to see whether the nonparametric component can be parametrized
Inference on Treatment Effects After Selection Amongst High-Dimensional Controls
We propose robust methods for inference on the effect of a treatment variable
on a scalar outcome in the presence of very many controls. Our setting is a
partially linear model with possibly non-Gaussian and heteroscedastic
disturbances. Our analysis allows the number of controls to be much larger than
the sample size. To make informative inference feasible, we require the model
to be approximately sparse; that is, we require that the effect of confounding
factors can be controlled for up to a small approximation error by conditioning
on a relatively small number of controls whose identities are unknown. The
latter condition makes it possible to estimate the treatment effect by
selecting approximately the right set of controls. We develop a novel
estimation and uniformly valid inference method for the treatment effect in
this setting, called the "post-double-selection" method. Our results apply to
Lasso-type methods used for covariate selection as well as to any other model
selection method that is able to find a sparse model with good approximation
properties.
The main attractive feature of our method is that it allows for imperfect
selection of the controls and provides confidence intervals that are valid
uniformly across a large class of models. In contrast, standard post-model
selection estimators fail to provide uniform inference even in simple cases
with a small, fixed number of controls. Thus our method resolves the problem of
uniform inference after model selection for a large, interesting class of
models. We illustrate the use of the developed methods with numerical
simulations and an application to the effect of abortion on crime rates
Semiparametric Quantile Regression and Applications to Healthcare Data Analysis
University of Minnesota Ph.D. dissertation. 2018. Major: Statistics. Advisor: Lan Wang. 1 computer file (PDF); 122 pages.The ubiquity of healthcare data allows for complex analyses of a variety of topics ranging from healthcare cost to cognitive decline in dementia patients. Healthcare datasets are often highly skewed and heteroskedastic posing great challenges for statistical analyses. Quantile regression is an effective tool for analyzing healthcare datasets because, compared with mean regression, quantile regression has weaker assumptions which are more appropriate for complex data. Additionally, quantile regression models conditional quantiles of the response variable providing a more complete picture of the conditional distribution. In this dissertation, we propose three solutions to challenges in healthcare data analysis. All three solutions either directly rely on quantile regression or extend existing methodology and algorithms. Motivated by the Medical Expenditure Panel Survey containing data from individuals’ medical providers and employers across the United States, we propose a new semiparametric procedure for predicting whether a patient will incur high medical expenditure. The common practice is to artificially dichotomize the response. We propose a new semiparametric prediction rule to classify whether a future response occurs at the upper tail of the response distribution. The new method can be considered a semiparametric estimator of the Bayes rule for classification and enjoys some nice features. It incorporates nonlinear covariate effects and can be adapted to construct a prediction interval and hence provides more information about the future response. Next, we extend semiparametric quantile regression methodology to longitudinal studies with non-ignorable dropout. Dropout occurs when a patient leaves a study prior to its conclusion. Non-ignorable dropout occurs when the probability of dropout depends on the response. Failing to account for non-ignorable dropout can result in biased estimation. To handle dropout, we propose a weighted semiparametric quantile regression estimator where the weights are inversely proportional to the estimated probability remaining in the study. We show that this weighted estimator gives unbiased estimates of linear effects. We illustrate the advantages of the proposed method on a subset of the National Alzheimer’s Coordinating Center Uniform Data Set tracking cognitive decline in dementia patients. Lastly, we turn our attention to the issue of analyzing very large datasets with a large number of covariates and sample size. Penalized quantile regression is often used to simultaneously select variables and estimate effects by fitting models at many values of a tuning parameter. Existing algorithms have focused on improving computation time at one value of a tuning parameter, however obtaining model estimates for all values of the tuning parameter can still be prohibitively time-consuming. Instead of attempting to solve the penalized quantile regression problem for each value of a tuning parameter, we propose a sparsity path algorithm to approximate the solution allowing for fast exploration of candidate models at many different sparsity levels. Simulations show that the true model is always contained in the set of candidate models returned by the proposed sparsity path algorithm
[[alternative]]Bayesian Inference of the Nonparametric Stochastic Frontier Models
計畫編號:NSC94-2415-H032-006研究期間:200508~200607研究經費:597,000[[abstract]]隨機邊界模型通常被用來衡量一廠商的無效率程度。然而,我們常常發現其對於邊界的函數設定情形非常敏感。所以,即使是誤差分配的設定是正確的,錯誤的「技術」(邊界) 設定將導致錯誤的「無效率」推論。本篇文章因此放寬傳統的「參數」隨機邊界模型而考慮一個「無母數」的隨機邊界模型來避免設定誤差。我們利用貝氏方法中的馬可夫鍊蒙地卡羅方法來估計、分析與推論模型相關係數,而且估計之結果俱有小樣本性質。我們也推導出其所需之完全條件分配,並預期以一實際資料來應用與展現其實用性。[[sponsorship]]行政院國家科學委員
Conditional Transformation Models
The ultimate goal of regression analysis is to obtain information about the
conditional distribution of a response given a set of explanatory variables.
This goal is, however, seldom achieved because most established regression
models only estimate the conditional mean as a function of the explanatory
variables and assume that higher moments are not affected by the regressors.
The underlying reason for such a restriction is the assumption of additivity of
signal and noise. We propose to relax this common assumption in the framework
of transformation models. The novel class of semiparametric regression models
proposed herein allows transformation functions to depend on explanatory
variables. These transformation functions are estimated by regularised
optimisation of scoring rules for probabilistic forecasts, e.g. the continuous
ranked probability score. The corresponding estimated conditional distribution
functions are consistent. Conditional transformation models are potentially
useful for describing possible heteroscedasticity, comparing spatially varying
distributions, identifying extreme events, deriving prediction intervals and
selecting variables beyond mean regression effects. An empirical investigation
based on a heteroscedastic varying coefficient simulation model demonstrates
that semiparametric estimation of conditional distribution functions can be
more beneficial than kernel-based non-parametric approaches or parametric
generalised additive models for location, scale and shape
- …