19,210 research outputs found
Variable selection in measurement error models
Measurement error data or errors-in-variable data have been collected in many
studies. Natural criterion functions are often unavailable for general
functional measurement error models due to the lack of information on the
distribution of the unobservable covariates. Typically, the parameter
estimation is via solving estimating equations. In addition, the construction
of such estimating equations routinely requires solving integral equations,
hence the computation is often much more intensive compared with ordinary
regression models. Because of these difficulties, traditional best subset
variable selection procedures are not applicable, and in the measurement error
model context, variable selection remains an unsolved issue. In this paper, we
develop a framework for variable selection in measurement error models via
penalized estimating equations. We first propose a class of selection
procedures for general parametric measurement error models and for general
semi-parametric measurement error models, and study the asymptotic properties
of the proposed procedures. Then, under certain regularity conditions and with
a properly chosen regularization parameter, we demonstrate that the proposed
procedure performs as well as an oracle procedure. We assess the finite sample
performance via Monte Carlo simulation studies and illustrate the proposed
methodology through the empirical analysis of a familiar data set.Comment: Published in at http://dx.doi.org/10.3150/09-BEJ205 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Partially linear additive quantile regression in ultra-high dimension
We consider a flexible semiparametric quantile regression model for analyzing
high dimensional heterogeneous data. This model has several appealing features:
(1) By considering different conditional quantiles, we may obtain a more
complete picture of the conditional distribution of a response variable given
high dimensional covariates. (2) The sparsity level is allowed to be different
at different quantile levels. (3) The partially linear additive structure
accommodates nonlinearity and circumvents the curse of dimensionality. (4) It
is naturally robust to heavy-tailed distributions. In this paper, we
approximate the nonlinear components using B-spline basis functions. We first
study estimation under this model when the nonzero components are known in
advance and the number of covariates in the linear part diverges. We then
investigate a nonconvex penalized estimator for simultaneous variable selection
and estimation. We derive its oracle property for a general class of nonconvex
penalty functions in the presence of ultra-high dimensional covariates under
relaxed conditions. To tackle the challenges of nonsmooth loss function,
nonconvex penalty function and the presence of nonlinear components, we combine
a recently developed convex-differencing method with modern empirical process
techniques. Monte Carlo simulations and an application to a microarray study
demonstrate the effectiveness of the proposed method. We also discuss how the
method for a single quantile of interest can be extended to simultaneous
variable selection and estimation at multiple quantiles.Comment: Published at http://dx.doi.org/10.1214/15-AOS1367 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Efficient estimation of a semiparametric partially linear varying coefficient model
In this paper we propose a general series method to estimate a semiparametric
partially linear varying coefficient model. We establish the consistency and
\sqrtn-normality property of the estimator of the finite-dimensional parameters
of the model. We further show that, when the error is conditionally
homoskedastic, this estimator is semiparametrically efficient in the sense that
the inverse of the asymptotic variance of the estimator of the
finite-dimensional parameter reaches the semiparametric efficiency bound of
this model. A small-scale simulation is reported to examine the finite sample
performance of the proposed estimator, and an empirical application is
presented to illustrate the usefulness of the proposed method in practice. We
also discuss how to obtain an efficient estimation result when the error is
conditional heteroskedastic.Comment: Published at http://dx.doi.org/10.1214/009053604000000931 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data
We propose a novel approach for modeling multivariate longitudinal data in
the presence of unobserved heterogeneity for the analysis of the Health and
Retirement Study (HRS) data. Our proposal can be cast within the framework of
linear mixed models with discrete individual random intercepts; however,
differently from the standard formulation, the proposed Covariance Pattern
Mixture Model (CPMM) does not require the usual local independence assumption.
The model is thus able to simultaneously model the heterogeneity, the
association among the responses and the temporal dependence structure. We focus
on the investigation of temporal patterns related to the cognitive functioning
in retired American respondents. In particular, we aim to understand whether it
can be affected by some individual socio-economical characteristics and whether
it is possible to identify some homogenous groups of respondents that share a
similar cognitive profile. An accurate description of the detected groups
allows government policy interventions to be opportunely addressed. Results
identify three homogenous clusters of individuals with specific cognitive
functioning, consistent with the class conditional distribution of the
covariates. The flexibility of CPMM allows for a different contribution of each
regressor on the responses according to group membership. In so doing, the
identified groups receive a global and accurate phenomenological
characterization.Comment: Published at http://dx.doi.org/10.1214/15-AOAS816 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A variational Bayesian method for inverse problems with impulsive noise
We propose a novel numerical method for solving inverse problems subject to
impulsive noises which possibly contain a large number of outliers. The
approach is of Bayesian type, and it exploits a heavy-tailed t distribution for
data noise to achieve robustness with respect to outliers. A hierarchical model
with all hyper-parameters automatically determined from the given data is
described. An algorithm of variational type by minimizing the Kullback-Leibler
divergence between the true posteriori distribution and a separable
approximation is developed. The numerical method is illustrated on several one-
and two-dimensional linear and nonlinear inverse problems arising from heat
conduction, including estimating boundary temperature, heat flux and heat
transfer coefficient. The results show its robustness to outliers and the fast
and steady convergence of the algorithm.Comment: 20 pages, to appear in J. Comput. Phy
A loss function approach to model specification testing and its relative efficiency
The generalized likelihood ratio (GLR) test proposed by Fan, Zhang and Zhang
[Ann. Statist. 29 (2001) 153-193] and Fan and Yao [Nonlinear Time Series:
Nonparametric and Parametric Methods (2003) Springer] is a generally applicable
nonparametric inference procedure. In this paper, we show that although it
inherits many advantages of the parametric maximum likelihood ratio (LR) test,
the GLR test does not have the optimal power property. We propose a generally
applicable test based on loss functions, which measure discrepancies between
the null and nonparametric alternative models and are more relevant to
decision-making under uncertainty. The new test is asymptotically more powerful
than the GLR test in terms of Pitman's efficiency criterion. This efficiency
gain holds no matter what smoothing parameter and kernel function are used and
even when the true likelihood function is available for the GLR test.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1099 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Empirical likelihood estimation of the spatial quantile regression
The spatial quantile regression model is a useful and flexible model for analysis of empirical problems with spatial dimension. This paper introduces an alternative estimator for this model. The properties of the proposed estimator are discussed in a comparative perspective with regard to the other available estimators. Simulation evidence on the small sample properties of the proposed estimator is provided. The proposed estimator is feasible and preferable when the model contains multiple spatial weighting matrices. Furthermore, a version of the proposed estimator based on the exponentially tilted empirical likelihood could be beneficial if model misspecification is suspect
- …