5,478 research outputs found
Recommended from our members
Robust variable selection in partially varying coefficient single-index model
By combining basis function approximations and smoothly clipped absolute deviation (SCAD) penalty, this paper proposes a robust variable selection procedure for a partially varying coefficient single-index model based on modal regression. The proposed procedure simultaneously selects significant variables in the parametric components and the nonparametric components. With appropriate selection of the tuning parameters, we establish the theoretical properties of our procedure, including consistency in variable selection and the oracle property in estimation. Furthermore, we also discuss the bandwidth selection and propose a modified expectation-maximization (EM)-type algorithm for the proposed estimation procedure. The finite sample properties of the proposed estimators are illustrated by some simulation examples.The research of Zhu is partially supported by National Natural Science Foundation of China (NNSFC) under Grants 71171075, 71221001 and 71031004. The research of Yu is supported by NNSFC under Grant 11261048
Functional Regression
Functional data analysis (FDA) involves the analysis of data whose ideal
units of observation are functions defined on some continuous domain, and the
observed data consist of a sample of functions taken from some population,
sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the
development of this field, which has accelerated in the past 10 years to become
one of the fastest growing areas of statistics, fueled by the growing number of
applications yielding this type of data. One unique characteristic of FDA is
the need to combine information both across and within functions, which Ramsay
and Silverman called replication and regularization, respectively. This article
will focus on functional regression, the area of FDA that has received the most
attention in applications and methodological development. First will be an
introduction to basis functions, key building blocks for regularization in
functional regression methods, followed by an overview of functional regression
methods, split into three types: [1] functional predictor regression
(scalar-on-function), [2] functional response regression (function-on-scalar)
and [3] function-on-function regression. For each, the role of replication and
regularization will be discussed and the methodological development described
in a roughly chronological manner, at times deviating from the historical
timeline to group together similar methods. The primary focus is on modeling
and methodology, highlighting the modeling structures that have been developed
and the various regularization approaches employed. At the end is a brief
discussion describing potential areas of future development in this field
A Selective Review of Group Selection in High-Dimensional Models
Grouping structures arise naturally in many statistical modeling problems.
Several methods have been proposed for variable selection that respect grouping
structure in variables. Examples include the group LASSO and several concave
group selection methods. In this article, we give a selective review of group
selection concerning methodological developments, theoretical properties and
computational algorithms. We pay particular attention to group selection
methods involving concave penalties. We address both group selection and
bi-level selection methods. We describe several applications of these methods
in nonparametric additive models, semiparametric regression, seemingly
unrelated regressions, genomic data analysis and genome wide association
studies. We also highlight some issues that require further study.Comment: Published in at http://dx.doi.org/10.1214/12-STS392 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Partially linear additive quantile regression in ultra-high dimension
We consider a flexible semiparametric quantile regression model for analyzing
high dimensional heterogeneous data. This model has several appealing features:
(1) By considering different conditional quantiles, we may obtain a more
complete picture of the conditional distribution of a response variable given
high dimensional covariates. (2) The sparsity level is allowed to be different
at different quantile levels. (3) The partially linear additive structure
accommodates nonlinearity and circumvents the curse of dimensionality. (4) It
is naturally robust to heavy-tailed distributions. In this paper, we
approximate the nonlinear components using B-spline basis functions. We first
study estimation under this model when the nonzero components are known in
advance and the number of covariates in the linear part diverges. We then
investigate a nonconvex penalized estimator for simultaneous variable selection
and estimation. We derive its oracle property for a general class of nonconvex
penalty functions in the presence of ultra-high dimensional covariates under
relaxed conditions. To tackle the challenges of nonsmooth loss function,
nonconvex penalty function and the presence of nonlinear components, we combine
a recently developed convex-differencing method with modern empirical process
techniques. Monte Carlo simulations and an application to a microarray study
demonstrate the effectiveness of the proposed method. We also discuss how the
method for a single quantile of interest can be extended to simultaneous
variable selection and estimation at multiple quantiles.Comment: Published at http://dx.doi.org/10.1214/15-AOS1367 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Estimation of Single-Index Models Based on Boosting Techniques
In single-index models the link or response function is not considered as fixed. The data determine the form of the unknown link function. In order to obtain a flexible form of the link function we specify the link function as an expansion in basis function and propose to estimate parameters as well as the link function by weak learners within a boosting framework. It is shown that the method is a strong competitor to existing methods. The method is investigated in simulation studies and applied to real data
Variable selection in semiparametric regression modeling
In this paper, we are concerned with how to select significant variables in
semiparametric modeling. Variable selection for semiparametric regression
models consists of two components: model selection for nonparametric components
and selection of significant variables for the parametric portion. Thus,
semiparametric variable selection is much more challenging than parametric
variable selection (e.g., linear and generalized linear models) because
traditional variable selection procedures including stepwise regression and the
best subset selection now require separate model selection for the
nonparametric components for each submodel. This leads to a very heavy
computational burden. In this paper, we propose a class of variable selection
procedures for semiparametric regression models using nonconcave penalized
likelihood. We establish the rate of convergence of the resulting estimate.
With proper choices of penalty functions and regularization parameters, we show
the asymptotic normality of the resulting estimate and further demonstrate that
the proposed procedures perform as well as an oracle procedure. A
semiparametric generalized likelihood ratio test is proposed to select
significant variables in the nonparametric component. We investigate the
asymptotic behavior of the proposed test and demonstrate that its limiting null
distribution follows a chi-square distribution which is independent of the
nuisance parameters. Extensive Monte Carlo simulation studies are conducted to
examine the finite sample performance of the proposed variable selection
procedures.Comment: Published in at http://dx.doi.org/10.1214/009053607000000604 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Interaction Analysis of Repeated Measure Data
Extensive penalized variable selection methods have been developed in the past two decades for analyzing high dimensional omics data, such as gene expressions, single nucleotide polymorphisms (SNPs), copy number variations (CNVs) and others. However, lipidomics data have been rarely investigated by using high dimensional variable selection methods. This package incorporates our recently developed penalization procedures to conduct interaction analysis for high dimensional lipidomics data with repeated measurements. The core module of this package is developed in C++. The development of this software package and the associated statistical methods have been partially supported by an Innovative Research Award from Johnson Cancer Research Center, Kansas State University
- …