740 research outputs found
A Selective Review of Group Selection in High-Dimensional Models
Grouping structures arise naturally in many statistical modeling problems.
Several methods have been proposed for variable selection that respect grouping
structure in variables. Examples include the group LASSO and several concave
group selection methods. In this article, we give a selective review of group
selection concerning methodological developments, theoretical properties and
computational algorithms. We pay particular attention to group selection
methods involving concave penalties. We address both group selection and
bi-level selection methods. We describe several applications of these methods
in nonparametric additive models, semiparametric regression, seemingly
unrelated regressions, genomic data analysis and genome wide association
studies. We also highlight some issues that require further study.Comment: Published in at http://dx.doi.org/10.1214/12-STS392 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Bayesian semiparametric analysis for two-phase studies of gene-environment interaction
The two-phase sampling design is a cost-efficient way of collecting expensive
covariate information on a judiciously selected subsample. It is natural to
apply such a strategy for collecting genetic data in a subsample enriched for
exposure to environmental factors for gene-environment interaction (G x E)
analysis. In this paper, we consider two-phase studies of G x E interaction
where phase I data are available on exposure, covariates and disease status.
Stratified sampling is done to prioritize individuals for genotyping at phase
II conditional on disease and exposure. We consider a Bayesian analysis based
on the joint retrospective likelihood of phases I and II data. We address
several important statistical issues: (i) we consider a model with multiple
genes, environmental factors and their pairwise interactions. We employ a
Bayesian variable selection algorithm to reduce the dimensionality of this
potentially high-dimensional model; (ii) we use the assumption of gene-gene and
gene-environment independence to trade off between bias and efficiency for
estimating the interaction parameters through use of hierarchical priors
reflecting this assumption; (iii) we posit a flexible model for the joint
distribution of the phase I categorical variables using the nonparametric Bayes
construction of Dunson and Xing [J. Amer. Statist. Assoc. 104 (2009)
1042-1051].Comment: Published in at http://dx.doi.org/10.1214/12-AOAS599 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Evaluating Markers for Treatment Selection Based on Survival Time
For many medical conditions there are several treatment options available to patients. We consider evaluating markers based on a simple treatment selection policy that incorporates information on the patient\u27s marker value exceeding a threshold. Although traditional regression methods may assess the effect of the marker and treatment on outcomes, it is appealing to quantify more directly the potential impact on the population of using the marker to select treatment. A useful tool is the selection impact (SI) curve proposed by Song and Pepe (2004, \textit{Biometrics} \textbf{60}, 874--883) for binary outcomes. However, this approach does not deal with continuous outcomes, nor does it adjust for other covariates that are important for treatment selection. In this paper, we propose the SI curve for general outcomes, with specific focus on the survival time. We further propose the covariate specific SI curve to incorporate covariate information in treatment selection. Nonparametric and semiparametric estimators are developed accordingly. We show that the proposed estimators are consistent and asymptotically normal. Simulation studies demonstrate that these estimators work well with realistic sample sizes. We illustrate the SI curve and the statistical inference for it with data from an AIDS clinical trial
Evaluating Markers for Treatment Selection Based on Survival Time
For many medical conditions several treatment options may be available for treating patients. We consider evaluating markers based on a simple treatment selection policy that incorporates information on the patient\u27s marker value exceeding a threshold. For example, colon cancer patients may be treated by surgery alone or surgery plus chemotherapy. The c-myc gene expression level may be used as a biomarker for treatment selection. Although traditional regression methods may assess the effect of the marker and treatment on outcomes, it is appealing to quantify more directly the potential impact on the population of using the marker to select treatment. A useful tool is the selection impact (SI) curve proposed by Song and Pepe (2004, Biometrics 60, 874-883) for binary outcomes. However, the current SI method does not deal with continuous outcomes, nor does it allow to adjust for other covariates that are important for treatment selection. In this paper, we extend the SI curve for general outcomes, with a specific focus on survival time. We further propose the covariate specific SI curve to incorporate covariate information in treatment selection. Nonparametric and semiparametric estimators are developed accordingly. We show that the proposed estimators are consistent and asymptotically normal. The performance is illustrated by simulation studies and through an application to data from a cancer clinical trial
Targeted Methods for Finding Quantitative Trait Loci
Conventional genetic mapping methods typically assume parametric models with Gaussian errors, and obtain parameter estimates through maximum likelihood estimation. We propose a general semiparametric model to map quantitative trait loci (QTL) in experimental crosses. In contrast with widely-used interval mapping (IM) derived methods, our model requires fewer assumptions and also accommodates various machine learning algorithms. Estimation using both targeted maximum likelihood and collaborative targeted maximum likelihood methods is compared to a composite interval mapping (CIM) approach. We demonstrate with simulations and real data analyses that, on average, our semiparametric targeted learning approach produces less biased QTL effect estimates than those from parametric models
- …