740 research outputs found

    A Selective Review of Group Selection in High-Dimensional Models

    Full text link
    Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study.Comment: Published in at http://dx.doi.org/10.1214/12-STS392 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Bayesian semiparametric analysis for two-phase studies of gene-environment interaction

    Full text link
    The two-phase sampling design is a cost-efficient way of collecting expensive covariate information on a judiciously selected subsample. It is natural to apply such a strategy for collecting genetic data in a subsample enriched for exposure to environmental factors for gene-environment interaction (G x E) analysis. In this paper, we consider two-phase studies of G x E interaction where phase I data are available on exposure, covariates and disease status. Stratified sampling is done to prioritize individuals for genotyping at phase II conditional on disease and exposure. We consider a Bayesian analysis based on the joint retrospective likelihood of phases I and II data. We address several important statistical issues: (i) we consider a model with multiple genes, environmental factors and their pairwise interactions. We employ a Bayesian variable selection algorithm to reduce the dimensionality of this potentially high-dimensional model; (ii) we use the assumption of gene-gene and gene-environment independence to trade off between bias and efficiency for estimating the interaction parameters through use of hierarchical priors reflecting this assumption; (iii) we posit a flexible model for the joint distribution of the phase I categorical variables using the nonparametric Bayes construction of Dunson and Xing [J. Amer. Statist. Assoc. 104 (2009) 1042-1051].Comment: Published in at http://dx.doi.org/10.1214/12-AOAS599 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Evaluating Markers for Treatment Selection Based on Survival Time

    Get PDF
    For many medical conditions there are several treatment options available to patients. We consider evaluating markers based on a simple treatment selection policy that incorporates information on the patient\u27s marker value exceeding a threshold. Although traditional regression methods may assess the effect of the marker and treatment on outcomes, it is appealing to quantify more directly the potential impact on the population of using the marker to select treatment. A useful tool is the selection impact (SI) curve proposed by Song and Pepe (2004, \textit{Biometrics} \textbf{60}, 874--883) for binary outcomes. However, this approach does not deal with continuous outcomes, nor does it adjust for other covariates that are important for treatment selection. In this paper, we propose the SI curve for general outcomes, with specific focus on the survival time. We further propose the covariate specific SI curve to incorporate covariate information in treatment selection. Nonparametric and semiparametric estimators are developed accordingly. We show that the proposed estimators are consistent and asymptotically normal. Simulation studies demonstrate that these estimators work well with realistic sample sizes. We illustrate the SI curve and the statistical inference for it with data from an AIDS clinical trial

    Evaluating Markers for Treatment Selection Based on Survival Time

    Get PDF
    For many medical conditions several treatment options may be available for treating patients. We consider evaluating markers based on a simple treatment selection policy that incorporates information on the patient\u27s marker value exceeding a threshold. For example, colon cancer patients may be treated by surgery alone or surgery plus chemotherapy. The c-myc gene expression level may be used as a biomarker for treatment selection. Although traditional regression methods may assess the effect of the marker and treatment on outcomes, it is appealing to quantify more directly the potential impact on the population of using the marker to select treatment. A useful tool is the selection impact (SI) curve proposed by Song and Pepe (2004, Biometrics 60, 874-883) for binary outcomes. However, the current SI method does not deal with continuous outcomes, nor does it allow to adjust for other covariates that are important for treatment selection. In this paper, we extend the SI curve for general outcomes, with a specific focus on survival time. We further propose the covariate specific SI curve to incorporate covariate information in treatment selection. Nonparametric and semiparametric estimators are developed accordingly. We show that the proposed estimators are consistent and asymptotically normal. The performance is illustrated by simulation studies and through an application to data from a cancer clinical trial

    Targeted Methods for Finding Quantitative Trait Loci

    Get PDF
    Conventional genetic mapping methods typically assume parametric models with Gaussian errors, and obtain parameter estimates through maximum likelihood estimation. We propose a general semiparametric model to map quantitative trait loci (QTL) in experimental crosses. In contrast with widely-used interval mapping (IM) derived methods, our model requires fewer assumptions and also accommodates various machine learning algorithms. Estimation using both targeted maximum likelihood and collaborative targeted maximum likelihood methods is compared to a composite interval mapping (CIM) approach. We demonstrate with simulations and real data analyses that, on average, our semiparametric targeted learning approach produces less biased QTL effect estimates than those from parametric models
    • …
    corecore