54 research outputs found

    New MM-estimators in semi-parametric regression with errors in variables

    Get PDF
    In the regression model with errors in variables, we observe nn i.i.d. copies of (Y,Z)(Y,Z) satisfying Y=fθ0(X)+ξY=f_{\theta^0}(X)+\xi and Z=X+ϵZ=X+\epsilon involving independent and unobserved random variables X,ξ,ϵX,\xi,\epsilon plus a regression function fθ0f_{\theta^0}, known up to a finite dimensional θ0\theta^0. The common densities of the XiX_i's and of the ξi\xi_i's are unknown, whereas the distribution of ϵ\epsilon is completely known. We aim at estimating the parameter θ0\theta^0 by using the observations (Y1,Z1),...,(Yn,Zn)(Y_1,Z_1),...,(Y_n,Z_n). We propose an estimation procedure based on the least square criterion \tilde{S}_{\theta^0,g}(\theta)=\m athbb{E}_{\theta^0,g}[((Y-f_{\theta}(X))^2w(X)] where ww is a weight function to be chosen. We propose an estimator and derive an upper bound for its risk that depends on the smoothness of the errors density pϵp_{\epsilon} and on the smoothness properties of w(x)fθ(x)w(x)f_{\theta}(x). Furthermore, we give sufficient conditions that ensure that the parametric rate of convergence is achieved. We provide practical recipes for the choice of ww in the case of nonlinear regression functions which are smooth on pieces allowing to gain in the order of the rate of convergence, up to the parametric rate in some cases. We also consider extensions of the estimation procedure, in particular, when a choice of wθw_{\theta} depending on θ\theta would be more appropriate.Comment: Published in at http://dx.doi.org/10.1214/07-AIHP107 the Annales de l'Institut Henri Poincar\'e - Probabilit\'es et Statistiques (http://www.imstat.org/aihp/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Semi-parametric estimation of the hazard function in a model with covariate measurement error

    Get PDF
    We consider a model where the failure hazard function, conditional on a covariate ZZ is given by R(t,θ0Z)=η_γ0(t)f_β0(Z)R(t,\theta^0|Z)=\eta\_{\gamma^0}(t)f\_{\beta^0}(Z), with θ0=(β0,γ0)Rm+p\theta^0=(\beta^0,\gamma^0)^\top\in \mathbb{R}^{m+p}. The baseline hazard function η_γ0\eta\_{\gamma^0} and relative risk f_β0f\_{\beta^0} belong both to parametric families. The covariate ZZ is measured through the error model U=Z+ϵU=Z+\epsilon where ϵ\epsilon is independent from ZZ, with known density f_ϵf\_\epsilon. We observe a nn-sample (X_i,D_i,U_i)(X\_i, D\_i, U\_i), i=1,...,ni=1,...,n, where X_iX\_i is the minimum between the failure time and the censoring time, and D_iD\_i is the censoring indicator. We aim at estimating θ0\theta^0 in presence of the unknown density gg. Our estimation procedure based on least squares criterion provide two estimators. The first one minimizes an estimation of the least squares criterion where gg is estimated by density deconvolution. Its rate depends on the smoothnesses of f_ϵf\_\epsilon and f_β(z)f\_\beta(z) as a function of zz,. We derive sufficient conditions that ensure the n\sqrt{n}-consistency. The second estimator is constructed under conditions ensuring that the least squares criterion can be directly estimated with the parametric rate. These estimators, deeply studied through examples are in particular n\sqrt{n}-consistent and asymptotically Gaussian in the Cox model and in the excess risk model, whatever is f_ϵf\_\epsilon

    Adaptive density deconvolution with dependent inputs

    Get PDF
    In the convolution model Z_i=X_i+ϵ_iZ\_i=X\_i+ \epsilon\_i, we give a model selection procedure to estimate the density of the unobserved variables (X_i)_1in(X\_i)\_{1 \leq i \leq n}, when the sequence (X_i)_i1(X\_i)\_{i \geq 1} is strictly stationary but not necessarily independent. This procedure depends on wether the density of ϵ_i\epsilon\_i is super smooth or ordinary smooth. The rates of convergence of the penalized contrast estimators are the same as in the independent framework, and are minimax over most classes of regularity on R{\mathbb R}. Our results apply to mixing sequences, but also to many other dependent sequences. When the errors are super smooth, the condition on the dependence coefficients is the minimal condition of that type ensuring that the sequence (X_i)_i1(X\_i)\_{i \geq 1} is not a long-memory process

    Adaptive density estimation for general ARCH models

    Get PDF
    We consider a model Y_t=σ_tη_tY\_t=\sigma\_t\eta\_t in which (σ_t)(\sigma\_t) is not independent of the noise process (η_t)(\eta\_t), but σ_t\sigma\_t is independent of η_t\eta\_t for each tt. We assume that (σ_t)(\sigma\_t) is stationary and we propose an adaptive estimator of the density of ln(σ2_t)\ln(\sigma^2\_t) based on the observations Y_tY\_t. Under various dependence structures, the rates of this nonparametric estimator coincide with the minimax rates obtained in the i.i.d. case when (σ_t)(\sigma\_t) and (η_t)(\eta\_t) are independent, in all cases where these minimax rates are known. The results apply to various linear and non linear ARCH processes

    Adaptive kernel estimation of the baseline function in the Cox model, with high-dimensional covariates

    Full text link
    The aim of this article is to propose a novel kernel estimator of the baseline function in a general high-dimensional Cox model, for which we derive non-asymptotic rates of convergence. To construct our estimator, we first estimate the regression parameter in the Cox model via a Lasso procedure. We then plug this estimator into the classical kernel estimator of the baseline function, obtained by smoothing the so-called Breslow estimator of the cumulative baseline function. We propose and study an adaptive procedure for selecting the bandwidth, in the spirit of Gold-enshluger and Lepski (2011). We state non-asymptotic oracle inequalities for the final estimator, which reveal the reduction of the rates of convergence when the dimension of the covariates grows

    Penalized contrast estimator for adaptive density deconvolution

    Get PDF
    The authors consider the problem of estimating the density gg of independent and identically distributed variables X_iX\_i, from a sample Z_1,...,Z_nZ\_1, ..., Z\_n where Z_i=X_i+σϵ_iZ\_i=X\_i+\sigma\epsilon\_i, i=1,...,ni=1, ..., n, ϵ\epsilon is a noise independent of XX, with σϵ\sigma\epsilon having known distribution. They present a model selection procedure allowing to construct an adaptive estimator of gg and to find non-asymptotic bounds for its L_2(R)\mathbb{L}\_2(\mathbb{R})-risk. The estimator achieves the minimax rate of convergence, in most cases where lowers bounds are available. A simulation study gives an illustration of the good practical performances of the method

    Model selection in logistic regression

    Full text link
    This paper is devoted to model selection in logistic regression. We extend the model selection principle introduced by Birg\'e and Massart (2001) to logistic regression model. This selection is done by using penalized maximum likelihood criteria. We propose in this context a completely data-driven criteria based on the slope heuristics. We prove non asymptotic oracle inequalities for selected estimators. Theoretical results are illustrated through simulation studies

    Estimation of the hazard function in a semiparametric model with covariate measurement error

    Get PDF
    International audienceWe consider a failure hazard function, conditional on a time-independent covariate , given by . The baseline hazard function and the relative risk both belong to parametric families with . The covariate has an unknown density and is measured with an error through an additive error model where is a random variable, independent from , with known density . We observe a -sample , = 1, ..., , where is the minimum between the failure time and the censoring time, and is the censoring indicator. Using least square criterion and deconvolution methods, we propose a consistent estimator of using the observations , = 1, ..., .
We give an upper bound for its risk which depends on the smoothness properties of and as a function of , and we derive sufficient conditions for the -consistency. We give detailed examples considering various type of relative risks and various types of error density . In particular, in the Cox model and in the excess risk model, the estimator of is -consistent and asymptotically Gaussian regardless of the form of
    corecore