6,901 research outputs found

    Variable selection in semiparametric regression modeling

    Full text link
    In this paper, we are concerned with how to select significant variables in semiparametric modeling. Variable selection for semiparametric regression models consists of two components: model selection for nonparametric components and selection of significant variables for the parametric portion. Thus, semiparametric variable selection is much more challenging than parametric variable selection (e.g., linear and generalized linear models) because traditional variable selection procedures including stepwise regression and the best subset selection now require separate model selection for the nonparametric components for each submodel. This leads to a very heavy computational burden. In this paper, we propose a class of variable selection procedures for semiparametric regression models using nonconcave penalized likelihood. We establish the rate of convergence of the resulting estimate. With proper choices of penalty functions and regularization parameters, we show the asymptotic normality of the resulting estimate and further demonstrate that the proposed procedures perform as well as an oracle procedure. A semiparametric generalized likelihood ratio test is proposed to select significant variables in the nonparametric component. We investigate the asymptotic behavior of the proposed test and demonstrate that its limiting null distribution follows a chi-square distribution which is independent of the nuisance parameters. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedures.Comment: Published in at http://dx.doi.org/10.1214/009053607000000604 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Generic Path Algorithm for Regularized Statistical Estimation

    Full text link
    Regularization is widely used in statistics and machine learning to prevent overfitting and gear solution towards prior information. In general, a regularized estimation problem minimizes the sum of a loss function and a penalty term. The penalty term is usually weighted by a tuning parameter and encourages certain constraints on the parameters to be estimated. Particular choices of constraints lead to the popular lasso, fused-lasso, and other generalized l1l_1 penalized regression methods. Although there has been a lot of research in this area, developing efficient optimization methods for many nonseparable penalties remains a challenge. In this article we propose an exact path solver based on ordinary differential equations (EPSODE) that works for any convex loss function and can deal with generalized l1l_1 penalties as well as more complicated regularization such as inequality constraints encountered in shape-restricted regressions and nonparametric density estimation. In the path following process, the solution path hits, exits, and slides along the various constraints and vividly illustrates the tradeoffs between goodness of fit and model parsimony. In practice, the EPSODE can be coupled with AIC, BIC, CpC_p or cross-validation to select an optimal tuning parameter. Our applications to generalized l1l_1 regularized generalized linear models, shape-restricted regressions, Gaussian graphical models, and nonparametric density estimation showcase the potential of the EPSODE algorithm.Comment: 28 pages, 5 figure

    Maximum penalized quasi-likelihood estimation of the diffusion function

    Full text link
    We develop a maximum penalized quasi-likelihood estimator for estimating in a nonparametric way the diffusion function of a diffusion process, as an alternative to more traditional kernel-based estimators. After developing a numerical scheme for computing the maximizer of the penalized maximum quasi-likelihood function, we study the asymptotic properties of our estimator by way of simulation. Under the assumption that overnight London Interbank Offered Rates (LIBOR); the USD/EUR, USD/GBP, JPY/USD, and EUR/USD nominal exchange rates; and 1-month, 3-month, and 30-year Treasury bond yields are generated by diffusion processes, we use our numerical scheme to estimate the diffusion function.Comment: 17 pages, 4 figures, revised versio

    Estimating and explaining efficiency in a multilevel setting: A robust two-stage approach

    Get PDF
    Various applications require multilevel settings (e.g., for estimating fixed and random effects). However, due to the curse of dimensionality, the literature on non-parametric efficiency analysis did not yet explore the estimation of performance drivers in highly multilevel settings. As such, it lacks models which are particularly designed for multilevel estimations. This paper suggests a semi-parametric two-stage framework in which, in a first stage, non-parametric a effciency estimators are determined. As such, we do not require any a priori information on the production possibility set. In a second stage, a semiparametric Generalized Additive Mixed Model (GAMM) examines the sign and significance of both discrete and continuous background characteristics. The proper working of the procedure is illustrated by simulated data. Finally, the model is applied on real life data. In particular, using the proposed robust two-stage approach, we examine a claim by the Dutch Ministry of Education in that three out of the twelve Dutch provinces would provide lower quality education. When properly controlled for abilities, background variables, peer group and ability track effects, we do not observe differences among the provinces in educational attainments.Productivity estimation; Multilevel setting; Generalized Additive Mixed Model; Education; Social segregation

    Estimation and variable selection for generalized additive partial linear models

    Get PDF
    We study generalized additive partial linear models, proposing the use of polynomial spline smoothing for estimation of nonparametric functions, and deriving quasi-likelihood based estimators for the linear parameters. We establish asymptotic normality for the estimators of the parametric components. The procedure avoids solving large systems of equations as in kernel-based procedures and thus results in gains in computational simplicity. We further develop a class of variable selection procedures for the linear parameters by employing a nonconcave penalized quasi-likelihood, which is shown to have an asymptotic oracle property. Monte Carlo simulations and an empirical example are presented for illustration.Comment: Published in at http://dx.doi.org/10.1214/11-AOS885 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore