2,265 research outputs found
Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates
We propose generalized additive partial linear models for complex data which
allow one to capture nonlinear patterns of some covariates, in the presence of
linear components. The proposed method improves estimation efficiency and
increases statistical power for correlated data through incorporating the
correlation information. A unique feature of the proposed method is its
capability of handling model selection in cases where it is difficult to
specify the likelihood function. We derive the quadratic inference
function-based estimators for the linear coefficients and the nonparametric
functions when the dimension of covariates diverges, and establish asymptotic
normality for the linear coefficient estimators and the rates of convergence
for the nonparametric functions estimators for both finite and high-dimensional
cases. The proposed method and theoretical development are quite challenging
since the numbers of linear covariates and nonlinear components both increase
as the sample size increases. We also propose a doubly penalized procedure for
variable selection which can simultaneously identify nonzero linear and
nonparametric components, and which has an asymptotic oracle property.
Extensive Monte Carlo studies have been conducted and show that the proposed
procedure works effectively even with moderate sample sizes. A pharmacokinetics
study on renal cancer data is illustrated using the proposed method.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1194 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Partially linear additive quantile regression in ultra-high dimension
We consider a flexible semiparametric quantile regression model for analyzing
high dimensional heterogeneous data. This model has several appealing features:
(1) By considering different conditional quantiles, we may obtain a more
complete picture of the conditional distribution of a response variable given
high dimensional covariates. (2) The sparsity level is allowed to be different
at different quantile levels. (3) The partially linear additive structure
accommodates nonlinearity and circumvents the curse of dimensionality. (4) It
is naturally robust to heavy-tailed distributions. In this paper, we
approximate the nonlinear components using B-spline basis functions. We first
study estimation under this model when the nonzero components are known in
advance and the number of covariates in the linear part diverges. We then
investigate a nonconvex penalized estimator for simultaneous variable selection
and estimation. We derive its oracle property for a general class of nonconvex
penalty functions in the presence of ultra-high dimensional covariates under
relaxed conditions. To tackle the challenges of nonsmooth loss function,
nonconvex penalty function and the presence of nonlinear components, we combine
a recently developed convex-differencing method with modern empirical process
techniques. Monte Carlo simulations and an application to a microarray study
demonstrate the effectiveness of the proposed method. We also discuss how the
method for a single quantile of interest can be extended to simultaneous
variable selection and estimation at multiple quantiles.Comment: Published at http://dx.doi.org/10.1214/15-AOS1367 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Variable selection in measurement error models
Measurement error data or errors-in-variable data have been collected in many
studies. Natural criterion functions are often unavailable for general
functional measurement error models due to the lack of information on the
distribution of the unobservable covariates. Typically, the parameter
estimation is via solving estimating equations. In addition, the construction
of such estimating equations routinely requires solving integral equations,
hence the computation is often much more intensive compared with ordinary
regression models. Because of these difficulties, traditional best subset
variable selection procedures are not applicable, and in the measurement error
model context, variable selection remains an unsolved issue. In this paper, we
develop a framework for variable selection in measurement error models via
penalized estimating equations. We first propose a class of selection
procedures for general parametric measurement error models and for general
semi-parametric measurement error models, and study the asymptotic properties
of the proposed procedures. Then, under certain regularity conditions and with
a properly chosen regularization parameter, we demonstrate that the proposed
procedure performs as well as an oracle procedure. We assess the finite sample
performance via Monte Carlo simulation studies and illustrate the proposed
methodology through the empirical analysis of a familiar data set.Comment: Published in at http://dx.doi.org/10.3150/09-BEJ205 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score
Mendelian randomization (MR) is a method of exploiting genetic variation to
unbiasedly estimate a causal effect in presence of unmeasured confounding. MR
is being widely used in epidemiology and other related areas of population
science. In this paper, we study statistical inference in the increasingly
popular two-sample summary-data MR design. We show a linear model for the
observed associations approximately holds in a wide variety of settings when
all the genetic variants satisfy the exclusion restriction assumption, or in
genetic terms, when there is no pleiotropy. In this scenario, we derive a
maximum profile likelihood estimator with provable consistency and asymptotic
normality. However, through analyzing real datasets, we find strong evidence of
both systematic and idiosyncratic pleiotropy in MR, echoing the omnigenic model
of complex traits that is recently proposed in genetics. We model the
systematic pleiotropy by a random effects model, where no genetic variant
satisfies the exclusion restriction condition exactly. In this case we propose
a consistent and asymptotically normal estimator by adjusting the profile
score. We then tackle the idiosyncratic pleiotropy by robustifying the adjusted
profile score. We demonstrate the robustness and efficiency of the proposed
methods using several simulated and real datasets.Comment: 59 pages, 5 figures, 6 table
- …