6,901 research outputs found
Variable selection in semiparametric regression modeling
In this paper, we are concerned with how to select significant variables in
semiparametric modeling. Variable selection for semiparametric regression
models consists of two components: model selection for nonparametric components
and selection of significant variables for the parametric portion. Thus,
semiparametric variable selection is much more challenging than parametric
variable selection (e.g., linear and generalized linear models) because
traditional variable selection procedures including stepwise regression and the
best subset selection now require separate model selection for the
nonparametric components for each submodel. This leads to a very heavy
computational burden. In this paper, we propose a class of variable selection
procedures for semiparametric regression models using nonconcave penalized
likelihood. We establish the rate of convergence of the resulting estimate.
With proper choices of penalty functions and regularization parameters, we show
the asymptotic normality of the resulting estimate and further demonstrate that
the proposed procedures perform as well as an oracle procedure. A
semiparametric generalized likelihood ratio test is proposed to select
significant variables in the nonparametric component. We investigate the
asymptotic behavior of the proposed test and demonstrate that its limiting null
distribution follows a chi-square distribution which is independent of the
nuisance parameters. Extensive Monte Carlo simulation studies are conducted to
examine the finite sample performance of the proposed variable selection
procedures.Comment: Published in at http://dx.doi.org/10.1214/009053607000000604 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Generic Path Algorithm for Regularized Statistical Estimation
Regularization is widely used in statistics and machine learning to prevent
overfitting and gear solution towards prior information. In general, a
regularized estimation problem minimizes the sum of a loss function and a
penalty term. The penalty term is usually weighted by a tuning parameter and
encourages certain constraints on the parameters to be estimated. Particular
choices of constraints lead to the popular lasso, fused-lasso, and other
generalized penalized regression methods. Although there has been a lot
of research in this area, developing efficient optimization methods for many
nonseparable penalties remains a challenge. In this article we propose an exact
path solver based on ordinary differential equations (EPSODE) that works for
any convex loss function and can deal with generalized penalties as well
as more complicated regularization such as inequality constraints encountered
in shape-restricted regressions and nonparametric density estimation. In the
path following process, the solution path hits, exits, and slides along the
various constraints and vividly illustrates the tradeoffs between goodness of
fit and model parsimony. In practice, the EPSODE can be coupled with AIC, BIC,
or cross-validation to select an optimal tuning parameter. Our
applications to generalized regularized generalized linear models,
shape-restricted regressions, Gaussian graphical models, and nonparametric
density estimation showcase the potential of the EPSODE algorithm.Comment: 28 pages, 5 figure
Maximum penalized quasi-likelihood estimation of the diffusion function
We develop a maximum penalized quasi-likelihood estimator for estimating in a
nonparametric way the diffusion function of a diffusion process, as an
alternative to more traditional kernel-based estimators. After developing a
numerical scheme for computing the maximizer of the penalized maximum
quasi-likelihood function, we study the asymptotic properties of our estimator
by way of simulation. Under the assumption that overnight London Interbank
Offered Rates (LIBOR); the USD/EUR, USD/GBP, JPY/USD, and EUR/USD nominal
exchange rates; and 1-month, 3-month, and 30-year Treasury bond yields are
generated by diffusion processes, we use our numerical scheme to estimate the
diffusion function.Comment: 17 pages, 4 figures, revised versio
Estimating and explaining efficiency in a multilevel setting: A robust two-stage approach
Various applications require multilevel settings (e.g., for estimating fixed and random effects). However, due to the curse of dimensionality, the literature on non-parametric efficiency analysis did not yet explore the estimation of performance drivers in highly multilevel settings. As such, it lacks models which are particularly designed for multilevel estimations. This paper suggests a semi-parametric two-stage framework in which, in a first stage, non-parametric a effciency estimators are determined. As such, we do not require any a priori information on the production possibility set. In a second stage, a semiparametric Generalized Additive Mixed Model (GAMM) examines the sign and significance of both discrete and continuous background characteristics. The proper working of the procedure is illustrated by simulated data. Finally, the model is applied on real life data. In particular, using the proposed robust two-stage approach, we examine a claim by the Dutch Ministry of Education in that three out of the twelve Dutch provinces would provide lower quality education. When properly controlled for abilities, background variables, peer group and ability track effects, we do not observe differences among the provinces in educational attainments.Productivity estimation; Multilevel setting; Generalized Additive Mixed Model; Education; Social segregation
Estimation and variable selection for generalized additive partial linear models
We study generalized additive partial linear models, proposing the use of
polynomial spline smoothing for estimation of nonparametric functions, and
deriving quasi-likelihood based estimators for the linear parameters. We
establish asymptotic normality for the estimators of the parametric components.
The procedure avoids solving large systems of equations as in kernel-based
procedures and thus results in gains in computational simplicity. We further
develop a class of variable selection procedures for the linear parameters by
employing a nonconcave penalized quasi-likelihood, which is shown to have an
asymptotic oracle property. Monte Carlo simulations and an empirical example
are presented for illustration.Comment: Published in at http://dx.doi.org/10.1214/11-AOS885 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …
