1,514 research outputs found

    Semiparametric estimation exploiting covariate independence in two-phase randomized trials.

    Get PDF
    Recent results for case-control sampling suggest when the covariate distribution is constrained by gene-environment independence, semiparametric estimation exploiting such independence yields a great deal of efficiency gain. We consider the efficient estimation of the treatment-biomarker interaction in two-phase sampling nested within randomized clinical trials, incorporating the independence between a randomized treatment and the baseline markers. We develop a Newton-Raphson algorithm based on the profile likelihood to compute the semiparametric maximum likelihood estimate (SPMLE). Our algorithm accommodates both continuous phase-one outcomes and continuous phase-two biomarkers. The profile information matrix is computed explicitly via numerical differentiation. In certain situations where computing the SPMLE is slow, we propose a maximum estimated likelihood estimator (MELE), which is also capable of incorporating the covariate independence. This estimated likelihood approach uses a one-step empirical covariate distribution, thus is straightforward to maximize. It offers a closed-form variance estimate with limited increase in variance relative to the fully efficient SPMLE. Our results suggest exploiting the covariate independence in two-phase sampling increases the efficiency substantially, particularly for estimating treatment-biomarker interactions

    Identifying Target Populations for Screening or Not Screening Using Logic Regression

    Get PDF
    Colorectal cancer remains a significant public health concern despite the fact that effective screening procedures exist and that the disease is treatable when detected at early stages. Numerous risk factors for colon cancer have been identified, but none are very predictive alone. We sought to determine whether there are certain combinations of risk factors that distinguish well between cases and controls, and that could be used to identify subjects at particularly high or low risk of the disease to target screening. Using data from the Seattle site of the Colorectal Cancer Family Registry (C-CFR), we fit logic regression models to combine risk factor information. Logic regression is a methodology that identifies subsets of the population, described by Boolean combinations of binary coded risk factors. This method is well suited to situations in which interactions between many variables result in differences in disease risk. Neither the logic regression models nor stepwise logistic regression models fit for comparison resulted in criteria that could be used to direct subjects to screening. However, we believe that our novel statistical approach could be useful in settings where risk factors do discriminate between cases and controls, and illustrate this with a simulated dataset

    On Two-Stage Hypothesis Testing Procedures Via Asymptotically Independent Statistics

    Get PDF
    Kooperberg and LeBlanc (2008) proposed a two-stage testing procedure to screen for significant interactions in genome-wide association (GWA) studies by a soft threshold on marginal associations (MA), though its theoretical properties and generalization have not been elaborated. In this article, we discuss conditions that are required to achieve strong control of the Family-Wise Error Rate (FWER) by such procedures for low or high-dimensional hypothesis testing. We provide proof of asymptotic independence of marginal association statistics and interaction statistics in linear regression, logistic regression, and Cox proportional hazard models in a randomized clinical trial (RCT) with a rare event. In case-control studies nested within a RCT, a complementary criterion, namely deviation from baseline independence (DBI) in the case-control sample, is advocated as a screening tool for discovering significant interactions or main effects. Simulations and an application to a GWA study in Womenā€™s Health Initiative (WHI) are presented to show utilities of the proposed two-stage testing procedures in pharmacogenetic studies

    Comparison of Haplotype-based and Tree-based SNP Imputation in Association Studies

    Get PDF
    Missing single nucleotide polymorphisms (SNPs) are quite common in genetic association studies. Subjects with missing SNPs are often discarded in analyses, which may seriously undermine the inference of SNP-disease association. In this article, we compare two haplotype-based imputation approaches and one regression tree-based imputation approach for association studies. The goal is to assess the imputation accuracy, and to evaluate the impact of imputation on parameter estimation. Haplotype-based approaches build on haplotype reconstruction by the expectation-maximization (EM) algorithm or a weighted EM (WEM) algorithm, depending on whether case-control status is taken into account. The tree-based approach uses a Gibbs sampler to iteratively sample from a full conditional distribution, which is obtained from the classification and regression tree (CART) algorithm. We employ a standard multiple imputation procedure to account for the uncertainty of imputation. We apply the methods to simulated data as well as a case-control study on developmental dyslexia. Our results suggest that imputation generally improves over the standard practice of ignoring missing data in terms of bias and efficiency. The haplotype-based approaches slightly outperform the tree-based approach when there are a small number of SNPs in linkage disequilibrium (LD), but the latter has a computational advantage. Finally, we demonstrate that utilizing the disease status in imputation helps to reduce the bias in the subsequent parameter estimation

    Stability and aggregation of ranked gene lists

    Get PDF
    Ranked gene lists are highly instable in the sense that similar measures of differential gene expression may yield very different rankings, and that a small change of the data set usually affects the obtained gene list considerably. Stability issues have long been under-considered in the literature, but they have grown to a hot topic in the last few years, perhaps as a consequence of the increasing skepticism on the reproducibility and clinical applicability of molecular research findings. In this article, we review existing approaches for the assessment of stability of ranked gene lists and the related problem of aggregation, give some practical recommendations, and warn against potential misuse of these methods. This overview is illustrated through an application to a recent leukemia data set using the freely available Bioconductor package GeneSelector

    Heterogeneity-aware integrative analyses for ancestry-specific association studies

    Full text link
    Ancestry-specific proteome-wide association studies (PWAS) based on genetically predicted protein expression can reveal complex disease etiology specific to certain ancestral groups. These studies require ancestry-specific models for protein expression as a function of SNP genotypes. In order to improve protein expression prediction in ancestral populations historically underrepresented in genomic studies, we propose a new penalized maximum likelihood estimator for fitting ancestry-specific joint protein quantitative trait loci models. Our estimator borrows information across ancestral groups, while simultaneously allowing for heterogeneous error variances and regression coefficients. We propose an alternative parameterization of our model which makes the objective function convex and the penalty scale invariant. To improve computational efficiency, we propose an approximate version of our method and study its theoretical properties. Our method provides a substantial improvement in protein expression prediction accuracy in individuals of African ancestry, and in a downstream PWAS analysis, leads to the discovery of multiple associations between protein expression and blood lipid traits in the African ancestry population

    Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching

    Full text link

    Yeast Isw1p forms two separable complexes in vivo - Supplementary Materials Only

    Get PDF
    There are several classes of ATP-dependent chromatin remodeling complexes, which modulate the structure of chromatin to regulate a variety of cellular processes. The budding yeast, Saccharomyces cerevisiae, encodes two ATPases of the ISWI class, Isw1p and Isw2p. Previously Isw1p was shown to copurify with three other proteins. Here we identify these associated proteins and show that Isw1p forms two separable complexes in vivo (designated Isw1a and Isw1b). Biochemical assays revealed that while both have equivalent nucleosome-stimulated ATPase activities, Isw1a and Isw1b differ in their abilities to bind to DNA and nucleosomal substrates, which possibly accounts for differences in specific activities in nucleosomal spacing and sliding. In vivo, the two Isw1 complexes have overlapping functions in transcriptional regulation of some genes yet distinct functions at others. In addition, these complexes show different contributions to cell growth at elevated temperatures

    Testing significance relative to a fold-change threshold is a TREAT

    Get PDF
    Motivation: Statistical methods are used to test for the differential expression of genes in microarray experiments. The most widely used methods successfully test whether the true differential expression is different from zero, but give no assurance that the differences found are large enough to be biologically meaningful
    • ā€¦
    corecore