1,250 research outputs found
A Selective Review of Group Selection in High-Dimensional Models
Grouping structures arise naturally in many statistical modeling problems.
Several methods have been proposed for variable selection that respect grouping
structure in variables. Examples include the group LASSO and several concave
group selection methods. In this article, we give a selective review of group
selection concerning methodological developments, theoretical properties and
computational algorithms. We pay particular attention to group selection
methods involving concave penalties. We address both group selection and
bi-level selection methods. We describe several applications of these methods
in nonparametric additive models, semiparametric regression, seemingly
unrelated regressions, genomic data analysis and genome wide association
studies. We also highlight some issues that require further study.Comment: Published in at http://dx.doi.org/10.1214/12-STS392 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Sparse reduced-rank regression for imaging genetics studies: models and applications
We present a novel statistical technique; the sparse reduced rank regression (sRRR) model
which is a strategy for multivariate modelling of high-dimensional imaging responses and
genetic predictors. By adopting penalisation techniques, the model is able to enforce sparsity
in the regression coefficients, identifying subsets of genetic markers that best explain
the variability observed in subsets of the phenotypes. To properly exploit the rich structure
present in each of the imaging and genetics domains, we additionally propose the use of
several structured penalties within the sRRR model. Using simulation procedures that accurately
reflect realistic imaging genetics data, we present detailed evaluations of the sRRR
method in comparison with the more traditional univariate linear modelling approach. In
all settings considered, we show that sRRR possesses better power to detect the deleterious
genetic variants. Moreover, using a simple genetic model, we demonstrate the potential
benefits, in terms of statistical power, of carrying out voxel-wise searches as opposed to
extracting averages over regions of interest in the brain. Since this entails the use of phenotypic
vectors of enormous dimensionality, we suggest the use of a sparse classification
model as a de-noising step, prior to the imaging genetics study. Finally, we present the
application of a data re-sampling technique within the sRRR model for model selection.
Using this approach we are able to rank the genetic markers in order of importance of association
to the phenotypes, and similarly rank the phenotypes in order of importance to
the genetic markers. In the very end, we illustrate the application perspective of the proposed
statistical models in three real imaging genetics datasets and highlight some potential
associations
Nonconcave penalized composite conditional likelihood estimation of sparse Ising models
The Ising model is a useful tool for studying complex interactions within a
system. The estimation of such a model, however, is rather challenging,
especially in the presence of high-dimensional parameters. In this work, we
propose efficient procedures for learning a sparse Ising model based on a
penalized composite conditional likelihood with nonconcave penalties.
Nonconcave penalized likelihood estimation has received a lot of attention in
recent years. However, such an approach is computationally prohibitive under
high-dimensional Ising models. To overcome such difficulties, we extend the
methodology and theory of nonconcave penalized likelihood to penalized
composite conditional likelihood estimation. The proposed method can be
efficiently implemented by taking advantage of coordinate-ascent and
minorization--maximization principles. Asymptotic oracle properties of the
proposed method are established with NP-dimensionality. Optimality of the
computed local solution is discussed. We demonstrate its finite sample
performance via simulation studies and further illustrate our proposal by
studying the Human Immunodeficiency Virus type 1 protease structure based on
data from the Stanford HIV drug resistance database. Our statistical learning
results match the known biological findings very well, although no prior
biological information is used in the data analysis procedure.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1017 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Mixture model and subgroup analysis in nationwide kidney transplant center evaluation
Five year post-transplant survival rate is an important indicator on quality of care delivered by kidney transplant centers in the United States.
To provide a fair assessment of each transplant center, an effect that represents the center-specific care quality, along with patient level risk factors, is often included in the risk adjustment model.
In the past, the center effects have been modeled as either fixed effects or Gaussian random effects, with various merits and demerits.
We propose two new methods that allow flexible random effects distributions.
The first one is a Generalized Linear Mixed Model (GLMM) with normal mixture random effects.
By allowing random effects to be non homogeneous, the shrinkage effects is reduced and the predicted random effects are much closer to the truth.
In addition, modeling random effects as normal mixture will essentially clustering it into different groups, which provides a natural way of evaluating the performance in the transplant center case.
To decide the number of components, we do a sequential hypothesis tests.
In the second method, we propose a subgroup analysis on the random effects under the framework of GLMM.
Each level of the random effect is allowed to be a cluster by itself, but clusters that are close to each other will be merged into big ones.
This method provides more precise and stable estimation than fixed effects model while it has a much more flexible distributions for random effects than a GLMM with Gaussian assumption.
In addition, the other effects in the model will be selected via lasso type penalty
- …