Search CORE

740 research outputs found

A Selective Review of Group Selection in High-Dimensional Models

Author: Breheny Patrick
Huang Jian
Ma Shuangge
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study.Comment: Published in at http://dx.doi.org/10.1214/12-STS392 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Bayesian semiparametric analysis for two-phase studies of gene-environment interaction

Author: Ahn Jaeil
Ghosh Malay
Gruber Stephen B.
Mukherjee Bhramar
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2013
Field of study

The two-phase sampling design is a cost-efficient way of collecting expensive covariate information on a judiciously selected subsample. It is natural to apply such a strategy for collecting genetic data in a subsample enriched for exposure to environmental factors for gene-environment interaction (G x E) analysis. In this paper, we consider two-phase studies of G x E interaction where phase I data are available on exposure, covariates and disease status. Stratified sampling is done to prioritize individuals for genotyping at phase II conditional on disease and exposure. We consider a Bayesian analysis based on the joint retrospective likelihood of phases I and II data. We address several important statistical issues: (i) we consider a model with multiple genes, environmental factors and their pairwise interactions. We employ a Bayesian variable selection algorithm to reduce the dimensionality of this potentially high-dimensional model; (ii) we use the assumption of gene-gene and gene-environment independence to trade off between bias and efficiency for estimating the interaction parameters through use of hierarchical priors reflecting this assumption; (iii) we posit a flexible model for the joint distribution of the phase I categorical variables using the nonparametric Bayes construction of Dunson and Xing [J. Amer. Statist. Assoc. 104 (2009) 1042-1051].Comment: Published in at http://dx.doi.org/10.1214/12-AOAS599 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Evaluating Markers for Treatment Selection Based on Survival Time

Author: Augenlicht
Billingsley
Chen
Elmer-Dewitt
Erisman
Heagerty
Holland
Klein
Li
Pepe
Pepe
Pepe
Rubin
Rubin
Song
Song
Therneau
Van der Vaart
Zhao
Zhou
Publication venue: Collection of Biostatistics Research Archive
Publication date: 21/01/2011
Field of study

For many medical conditions there are several treatment options available to patients. We consider evaluating markers based on a simple treatment selection policy that incorporates information on the patient\u27s marker value exceeding a threshold. Although traditional regression methods may assess the effect of the marker and treatment on outcomes, it is appealing to quantify more directly the potential impact on the population of using the marker to select treatment. A useful tool is the selection impact (SI) curve proposed by Song and Pepe (2004, \textit{Biometrics} \textbf{60}, 874--883) for binary outcomes. However, this approach does not deal with continuous outcomes, nor does it adjust for other covariates that are important for treatment selection. In this paper, we propose the SI curve for general outcomes, with specific focus on the survival time. We further propose the covariate specific SI curve to incorporate covariate information in treatment selection. Nonparametric and semiparametric estimators are developed accordingly. We show that the proposed estimators are consistent and asymptotically normal. Simulation studies demonstrate that these estimators work well with realistic sample sizes. We illustrate the SI curve and the statistical inference for it with data from an AIDS clinical trial

Crossref

Collection Of Biostatistics Research Archive

Evaluating Markers for Treatment Selection Based on Survival Time

Author: Song Xiao
Zhou Xiao-Hua
Publication venue: Collection of Biostatistics Research Archive
Publication date: 09/06/2009
Field of study

For many medical conditions several treatment options may be available for treating patients. We consider evaluating markers based on a simple treatment selection policy that incorporates information on the patient\u27s marker value exceeding a threshold. For example, colon cancer patients may be treated by surgery alone or surgery plus chemotherapy. The c-myc gene expression level may be used as a biomarker for treatment selection. Although traditional regression methods may assess the effect of the marker and treatment on outcomes, it is appealing to quantify more directly the potential impact on the population of using the marker to select treatment. A useful tool is the selection impact (SI) curve proposed by Song and Pepe (2004, Biometrics 60, 874-883) for binary outcomes. However, the current SI method does not deal with continuous outcomes, nor does it allow to adjust for other covariates that are important for treatment selection. In this paper, we extend the SI curve for general outcomes, with a specific focus on survival time. We further propose the covariate specific SI curve to incorporate covariate information in treatment selection. Nonparametric and semiparametric estimators are developed accordingly. We show that the proposed estimators are consistent and asymptotically normal. The performance is illustrated by simulation studies and through an application to data from a cancer clinical trial

Collection Of Biostatistics Research Archive

Targeted Methods for Finding Quantitative Trait Loci

Author: Rose Sherri
van der Laan Mark J.
Wang Hui
Publication venue: Collection of Biostatistics Research Archive
Publication date: 06/07/2011
Field of study

Conventional genetic mapping methods typically assume parametric models with Gaussian errors, and obtain parameter estimates through maximum likelihood estimation. We propose a general semiparametric model to map quantitative trait loci (QTL) in experimental crosses. In contrast with widely-used interval mapping (IM) derived methods, our model requires fewer assumptions and also accommodates various machine learning algorithms. Estimation using both targeted maximum likelihood and collaborative targeted maximum likelihood methods is compared to a composite interval mapping (CIM) approach. We demonstrate with simulations and real data analyses that, on average, our semiparametric targeted learning approach produces less biased QTL effect estimates than those from parametric models

Collection Of Biostatistics Research Archive