607 research outputs found
Variable selection in semiparametric regression modeling
In this paper, we are concerned with how to select significant variables in
semiparametric modeling. Variable selection for semiparametric regression
models consists of two components: model selection for nonparametric components
and selection of significant variables for the parametric portion. Thus,
semiparametric variable selection is much more challenging than parametric
variable selection (e.g., linear and generalized linear models) because
traditional variable selection procedures including stepwise regression and the
best subset selection now require separate model selection for the
nonparametric components for each submodel. This leads to a very heavy
computational burden. In this paper, we propose a class of variable selection
procedures for semiparametric regression models using nonconcave penalized
likelihood. We establish the rate of convergence of the resulting estimate.
With proper choices of penalty functions and regularization parameters, we show
the asymptotic normality of the resulting estimate and further demonstrate that
the proposed procedures perform as well as an oracle procedure. A
semiparametric generalized likelihood ratio test is proposed to select
significant variables in the nonparametric component. We investigate the
asymptotic behavior of the proposed test and demonstrate that its limiting null
distribution follows a chi-square distribution which is independent of the
nuisance parameters. Extensive Monte Carlo simulation studies are conducted to
examine the finite sample performance of the proposed variable selection
procedures.Comment: Published in at http://dx.doi.org/10.1214/009053607000000604 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Variable selection in measurement error models
Measurement error data or errors-in-variable data have been collected in many
studies. Natural criterion functions are often unavailable for general
functional measurement error models due to the lack of information on the
distribution of the unobservable covariates. Typically, the parameter
estimation is via solving estimating equations. In addition, the construction
of such estimating equations routinely requires solving integral equations,
hence the computation is often much more intensive compared with ordinary
regression models. Because of these difficulties, traditional best subset
variable selection procedures are not applicable, and in the measurement error
model context, variable selection remains an unsolved issue. In this paper, we
develop a framework for variable selection in measurement error models via
penalized estimating equations. We first propose a class of selection
procedures for general parametric measurement error models and for general
semi-parametric measurement error models, and study the asymptotic properties
of the proposed procedures. Then, under certain regularity conditions and with
a properly chosen regularization parameter, we demonstrate that the proposed
procedure performs as well as an oracle procedure. We assess the finite sample
performance via Monte Carlo simulation studies and illustrate the proposed
methodology through the empirical analysis of a familiar data set.Comment: Published in at http://dx.doi.org/10.3150/09-BEJ205 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Rejoinder: One-step sparse estimates in nonconcave penalized likelihood models
We would like to take this opportunity to thank the discussants for their
thoughtful comments and encouragements on our work [arXiv:0808.1012]. The
discussants raised a number of issues from theoretical as well as computational
perspectives. Our rejoinder will try to provide some insights into these issues
and address specific questions asked by the discussants.Comment: Published in at http://dx.doi.org/10.1214/07-AOS0316REJ the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Variable selection using MM algorithms
Variable selection is fundamental to high-dimensional statistical modeling.
Many variable selection techniques may be implemented by maximum penalized
likelihood using various penalty functions. Optimizing the penalized likelihood
function is often challenging because it may be nondifferentiable and/or
nonconcave. This article proposes a new class of algorithms for finding a
maximizer of the penalized likelihood for a broad class of penalty functions.
These algorithms operate by perturbing the penalty function slightly to render
it differentiable, then optimizing this differentiable function using a
minorize-maximize (MM) algorithm. MM algorithms are useful extensions of the
well-known class of EM algorithms, a fact that allows us to analyze the local
and global convergence of the proposed algorithm using some of the techniques
employed for EM algorithms. In particular, we prove that when our MM algorithms
converge, they must converge to a desirable point; we also discuss conditions
under which this convergence may be guaranteed. We exploit the
Newton-Raphson-like aspect of these algorithms to propose a sandwich estimator
for the standard errors of the estimators. Our method performs well in
numerical tests.Comment: Published at http://dx.doi.org/10.1214/009053605000000200 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Multivariate varying coefficient model for functional responses
Motivated by recent work studying massive imaging data in the neuroimaging
literature, we propose multivariate varying coefficient models (MVCM) for
modeling the relation between multiple functional responses and a set of
covariates. We develop several statistical inference procedures for MVCM and
systematically study their theoretical properties. We first establish the weak
convergence of the local linear estimate of coefficient functions, as well as
its asymptotic bias and variance, and then we derive asymptotic bias and mean
integrated squared error of smoothed individual functions and their uniform
convergence rate. We establish the uniform convergence rate of the estimated
covariance function of the individual functions and its associated eigenvalue
and eigenfunctions. We propose a global test for linear hypotheses of varying
coefficient functions, and derive its asymptotic distribution under the null
hypothesis. We also propose a simultaneous confidence band for each individual
effect curve. We conduct Monte Carlo simulation to examine the finite-sample
performance of the proposed procedures. We apply MVCM to investigate the
development of white matter diffusivities along the genu tract of the corpus
callosum in a clinical study of neurodevelopment.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1045 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Calibrating nonconvex penalized regression in ultra-high dimension
We investigate high-dimensional nonconvex penalized regression, where the
number of covariates may grow at an exponential rate. Although recent
asymptotic theory established that there exists a local minimum possessing the
oracle property under general conditions, it is still largely an open problem
how to identify the oracle estimator among potentially multiple local minima.
There are two main obstacles: (1) due to the presence of multiple minima, the
solution path is nonunique and is not guaranteed to contain the oracle
estimator; (2) even if a solution path is known to contain the oracle
estimator, the optimal tuning parameter depends on many unknown factors and is
hard to estimate. To address these two challenging issues, we first prove that
an easy-to-calculate calibrated CCCP algorithm produces a consistent solution
path which contains the oracle estimator with probability approaching one.
Furthermore, we propose a high-dimensional BIC criterion and show that it can
be applied to the solution path to select the optimal tuning parameter which
asymptotically identifies the oracle estimator. The theory for a general class
of nonconvex penalties in the ultra-high dimensional setup is established when
the random errors follow the sub-Gaussian distribution. Monte Carlo studies
confirm that the calibrated CCCP algorithm combined with the proposed
high-dimensional BIC has desirable performance in identifying the underlying
sparsity pattern for high-dimensional data analysis.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1159 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …