4 research outputs found

    Non-asymptotic approach to varying coefficient model

    Get PDF
    In the present paper we consider the varying coefficient model which represents a useful tool for exploring dynamic patterns in many applications. Existing methods typically provide asymptotic evaluation of precision of estimation procedures under the assumption that the number of observations tends to infinity. In practical applications, however, only a finite number of measurements are available. In the present paper we focus on a non-asymptotic approach to the problem. We propose a novel estimation procedure which is based on recent developments in matrix estimation. In particular, for our estimator, we obtain upper bounds for the mean squared and the pointwise estimation errors. The obtained oracle inequalities are non-asymptotic and hold for finite sample size

    Spline estimator for simultaneous variable selection and constant coefficient identification in high-dimensional generalized varying-coefficient models

    No full text
    In this paper, we are concerned with two common and related problems for generalized varying-coefficient models, variable selection and constant coefficient identification. Starting with a specification of generalized varying-coefficient models assuming possible nonlinear interactions between the index variable and all other predictors, we propose a polynomial-spline based procedure that simultaneously eliminates irrelevant predictors and identifies predictors that do not interact with the index variable. Our approach is based on a double-penalization strategy where two penalty functions are used for these two related purposes respectively, in a single functional. In a “large p, small n” setting, we demonstrate the convergence rates of the estimator under suitable regularity assumptions. Based on its previous success on parametric models, we use the extended Bayesian information criterion (eBIC) to automatically choose the regularization parameters. Finally, post-penalization estimator is proposed to further reduce the bias of the resulting estimator. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed procedures and an application to a leukemia dataset is presented.23 page(s

    Spline estimator for simultaneous variable selection and constant coefficient identification in high-dimensional generalized varying-coefficient models

    No full text
    In this paper, we are concerned with two common and related problems for generalized varying-coefficient models, variable selection and constant coefficient identification. Starting with a specification of generalized varying-coefficient models assuming possible nonlinear interactions between the index variable and all other predictors, we propose a polynomial-spline based procedure that simultaneously eliminates irrelevant predictors and identifies predictors that do not interact with the index variable. Our approach is based on a double-penalization strategy where two penalty functions are used for these two related purposes respectively, in a single functional. In a "large p, small n" setting, we demonstrate the convergence rates of the estimator under suitable regularity assumptions. Based on its previous success on parametric models, we use the extended Bayesian information criterion (eBIC) to automatically choose the regularization parameters. Finally, post-penalization estimator is proposed to further reduce the bias of the resulting estimator. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed procedures and an application to a leukemia dataset is presented

    Non/Semi-parametric learning from data with complex features

    Get PDF
    This thesis is mainly focused on developing novel and flexible non/semi-parametric statistical methods dealing with data with complex features. In recent years, advancement of high throughput technologies has made it possible to collect sophisticated high-dimensional datasets, such as microarray data, genome-wide single nucleotide polymorphism (SNP) data, and RNA sequencing (RNA-seq) data. These advances have caused an escalating demand for innovative dimension reduction tools to extract useful information from the huge amount of data, to visualize the underlying structure, and to facilitate the understanding and analysis of the data. The research undertaken in my thesis are described below. In Chapter 1, we consider a semiparametric additive partially linear regression model (APLM) for analyzing ultra-high-dimensional data where both the number of linear components and the number of nonlinear components can be much larger than the sample size. We propose a two-step approach for estimation, selection and simultaneous inference of the components in the APLM. In the first step, the nonlinear additive components are approximated using polynomial spline basis functions, and a doubly penalized procedure is proposed to select nonzero linear and nonlinear components based on adaptive LASSO. In the second step, local linear smoothing is then applied to the data with the selected variables to obtain the asymptotic distribution of the estimators of the nonparametric functions of interest. The proposed method selects the correct model with probability approaching one under regularity conditions. The estimators of both the linear part and nonlinear part are consistent and asymptotically normal, which enables us to construct confidence intervals and make inferences about the regression coefficients and the component functions. The performance of the method is evaluated by simulation studies. The proposed method is also applied to a dataset on the Shoot Apical Meristem (SAM) of maize genotypes. In Chapter 2, we further consider the model identification problem, as long with variable selection, estimation and inference simultaneously for the additive partially linear model (APLM). APLM combines the flexibility of nonparametric regression with the parsimony of regression models, and has been widely used as a popular tool in multivariate nonparametric regression to alleviate the curse of dimensionality . A natural question raised in practice is the choice of structure in the nonparametric part, that is, whether the continuous covariates enter into the model in linear or nonparametric form. In this paper we present a comprehensive framework for simultaneous sparse model identification and learning for ultra-high-dimensional APLMs where both the linear and nonparametric components are possibly larger than the sample size. We propose a fast and efficient two-stage procedure. In the first stage, we decompose the nonparametric functions into a linear part and a nonlinear part. The nonlinear functions are approximated by constant spline bases, and a triple penalization procedure is proposed to select nonzero components using adaptive group LASSO. In the second stage, we refit data with selected covariates using higher order polynomial splines, and apply spline backfitted local linear smoothing to obtain asymptotic normality for the estimators. The procedure is shown to be consistent for model structure identification. It can identify zero, linear, and nonlinear components correctly and efficiently. Inference can be made on both linear coefficients and nonparametric functions. We conduct simulation studies to evaluate the performance of the method, and apply the proposed method to a dataset on the Shoot Apical Meristem (SAM) of maize genotypes for illustration. In Chapter 3, motivated by recent advances in technology for brain imaging and high-throughput genotyping, we consider an imaging genetics approach to discover relationships between the interplay of genetic variation and environmental factors and measurements from imaging phenotypes. We propose an image-on-scalar regression method, in which the spatial heterogeneity of gene-environment interactions on imaging responses is investigated via an ultra-high-dimensional spatially varying coefficient model (SVCM). Bivariate splines on triangulations are used to represent the coefficient functions over an irregular two-dimensional domain of interest. When using the image-on-scalar regression method, a natural question raised in practice is if the coefficient function is really varying over space. In this paper, we present a unified approach for simultaneous sparse learning and model structure identification (i.e., varying and constant coefficients separation). Our method can identify zero, nonzero constant and spatially varying components correctly and efficiently. The estimators of constant coefficients and varying coefficient functions are consistent. The performance of the method is evaluated by a few simulation examples and a brain mapping study based on the Alzheimer\u27s Disease Neuroimaging Initiative data
    corecore