19,210 research outputs found

    Variable selection in measurement error models

    Full text link
    Measurement error data or errors-in-variable data have been collected in many studies. Natural criterion functions are often unavailable for general functional measurement error models due to the lack of information on the distribution of the unobservable covariates. Typically, the parameter estimation is via solving estimating equations. In addition, the construction of such estimating equations routinely requires solving integral equations, hence the computation is often much more intensive compared with ordinary regression models. Because of these difficulties, traditional best subset variable selection procedures are not applicable, and in the measurement error model context, variable selection remains an unsolved issue. In this paper, we develop a framework for variable selection in measurement error models via penalized estimating equations. We first propose a class of selection procedures for general parametric measurement error models and for general semi-parametric measurement error models, and study the asymptotic properties of the proposed procedures. Then, under certain regularity conditions and with a properly chosen regularization parameter, we demonstrate that the proposed procedure performs as well as an oracle procedure. We assess the finite sample performance via Monte Carlo simulation studies and illustrate the proposed methodology through the empirical analysis of a familiar data set.Comment: Published in at http://dx.doi.org/10.3150/09-BEJ205 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Partially linear additive quantile regression in ultra-high dimension

    Get PDF
    We consider a flexible semiparametric quantile regression model for analyzing high dimensional heterogeneous data. This model has several appealing features: (1) By considering different conditional quantiles, we may obtain a more complete picture of the conditional distribution of a response variable given high dimensional covariates. (2) The sparsity level is allowed to be different at different quantile levels. (3) The partially linear additive structure accommodates nonlinearity and circumvents the curse of dimensionality. (4) It is naturally robust to heavy-tailed distributions. In this paper, we approximate the nonlinear components using B-spline basis functions. We first study estimation under this model when the nonzero components are known in advance and the number of covariates in the linear part diverges. We then investigate a nonconvex penalized estimator for simultaneous variable selection and estimation. We derive its oracle property for a general class of nonconvex penalty functions in the presence of ultra-high dimensional covariates under relaxed conditions. To tackle the challenges of nonsmooth loss function, nonconvex penalty function and the presence of nonlinear components, we combine a recently developed convex-differencing method with modern empirical process techniques. Monte Carlo simulations and an application to a microarray study demonstrate the effectiveness of the proposed method. We also discuss how the method for a single quantile of interest can be extended to simultaneous variable selection and estimation at multiple quantiles.Comment: Published at http://dx.doi.org/10.1214/15-AOS1367 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Efficient estimation of a semiparametric partially linear varying coefficient model

    Get PDF
    In this paper we propose a general series method to estimate a semiparametric partially linear varying coefficient model. We establish the consistency and \sqrtn-normality property of the estimator of the finite-dimensional parameters of the model. We further show that, when the error is conditionally homoskedastic, this estimator is semiparametrically efficient in the sense that the inverse of the asymptotic variance of the estimator of the finite-dimensional parameter reaches the semiparametric efficiency bound of this model. A small-scale simulation is reported to examine the finite sample performance of the proposed estimator, and an empirical application is presented to illustrate the usefulness of the proposed method in practice. We also discuss how to obtain an efficient estimation result when the error is conditional heteroskedastic.Comment: Published at http://dx.doi.org/10.1214/009053604000000931 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data

    Full text link
    We propose a novel approach for modeling multivariate longitudinal data in the presence of unobserved heterogeneity for the analysis of the Health and Retirement Study (HRS) data. Our proposal can be cast within the framework of linear mixed models with discrete individual random intercepts; however, differently from the standard formulation, the proposed Covariance Pattern Mixture Model (CPMM) does not require the usual local independence assumption. The model is thus able to simultaneously model the heterogeneity, the association among the responses and the temporal dependence structure. We focus on the investigation of temporal patterns related to the cognitive functioning in retired American respondents. In particular, we aim to understand whether it can be affected by some individual socio-economical characteristics and whether it is possible to identify some homogenous groups of respondents that share a similar cognitive profile. An accurate description of the detected groups allows government policy interventions to be opportunely addressed. Results identify three homogenous clusters of individuals with specific cognitive functioning, consistent with the class conditional distribution of the covariates. The flexibility of CPMM allows for a different contribution of each regressor on the responses according to group membership. In so doing, the identified groups receive a global and accurate phenomenological characterization.Comment: Published at http://dx.doi.org/10.1214/15-AOAS816 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A variational Bayesian method for inverse problems with impulsive noise

    Full text link
    We propose a novel numerical method for solving inverse problems subject to impulsive noises which possibly contain a large number of outliers. The approach is of Bayesian type, and it exploits a heavy-tailed t distribution for data noise to achieve robustness with respect to outliers. A hierarchical model with all hyper-parameters automatically determined from the given data is described. An algorithm of variational type by minimizing the Kullback-Leibler divergence between the true posteriori distribution and a separable approximation is developed. The numerical method is illustrated on several one- and two-dimensional linear and nonlinear inverse problems arising from heat conduction, including estimating boundary temperature, heat flux and heat transfer coefficient. The results show its robustness to outliers and the fast and steady convergence of the algorithm.Comment: 20 pages, to appear in J. Comput. Phy

    A loss function approach to model specification testing and its relative efficiency

    Full text link
    The generalized likelihood ratio (GLR) test proposed by Fan, Zhang and Zhang [Ann. Statist. 29 (2001) 153-193] and Fan and Yao [Nonlinear Time Series: Nonparametric and Parametric Methods (2003) Springer] is a generally applicable nonparametric inference procedure. In this paper, we show that although it inherits many advantages of the parametric maximum likelihood ratio (LR) test, the GLR test does not have the optimal power property. We propose a generally applicable test based on loss functions, which measure discrepancies between the null and nonparametric alternative models and are more relevant to decision-making under uncertainty. The new test is asymptotically more powerful than the GLR test in terms of Pitman's efficiency criterion. This efficiency gain holds no matter what smoothing parameter and kernel function are used and even when the true likelihood function is available for the GLR test.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1099 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Empirical likelihood estimation of the spatial quantile regression

    Get PDF
    The spatial quantile regression model is a useful and flexible model for analysis of empirical problems with spatial dimension. This paper introduces an alternative estimator for this model. The properties of the proposed estimator are discussed in a comparative perspective with regard to the other available estimators. Simulation evidence on the small sample properties of the proposed estimator is provided. The proposed estimator is feasible and preferable when the model contains multiple spatial weighting matrices. Furthermore, a version of the proposed estimator based on the exponentially tilted empirical likelihood could be beneficial if model misspecification is suspect
    • …
    corecore