56 research outputs found

    Methodological Issues in Multistage Genome-Wide Association Studies

    Full text link
    Because of the high cost of commercial genotyping chip technologies, many investigations have used a two-stage design for genome-wide association studies, using part of the sample for an initial discovery of ``promising'' SNPs at a less stringent significance level and the remainder in a joint analysis of just these SNPs using custom genotyping. Typical cost savings of about 50% are possible with this design to obtain comparable levels of overall type I error and power by using about half the sample for stage I and carrying about 0.1% of SNPs forward to the second stage, the optimal design depending primarily upon the ratio of costs per genotype for stages I and II. However, with the rapidly declining costs of the commercial panels, the generally low observed ORs of current studies, and many studies aiming to test multiple hypotheses and multiple endpoints, many investigators are abandoning the two-stage design in favor of simply genotyping all available subjects using a standard high-density panel. Concern is sometimes raised about the absence of a ``replication'' panel in this approach, as required by some high-profile journals, but it must be appreciated that the two-stage design is not a discovery/replication design but simply a more efficient design for discovery using a joint analysis of the data from both stages. Once a subset of highly-significant associations has been discovered, a truly independent ``exact replication'' study is needed in a similar population of the same promising SNPs using similar methods.Comment: Published in at http://dx.doi.org/10.1214/09-STS288 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Comparison of family-based association tests in chromosome regions selected by linkage-based confidence intervals

    Get PDF
    We use the Genetic Analysis Workshop 14 simulated data to explore the effectiveness of a two-stage strategy for mapping complex disease loci consisting of an initial genome scan with confidence interval construction for gene location, followed by fine mapping with family-based tests of association on a dense set of single-nucleotide polymorphisms. We considered four types of intervals: the 1-LOD interval, a basic percentile bootstrap confidence interval based on the position of the maximum Zlr score, and asymptotic and bootstrap confidence intervals based on a generalized estimating equations method. For fine mapping we considered two family-based tests of association: a test based on a likelihood ratio statistic and a transmission-disequilibrium-type test implemented in the software FBAT. In two of the simulation replicates, we found that the bootstrap confidence intervals based on the peak Zlr and the 1-LOD support interval always contained the true disease loci and that the likelihood ratio test provided further strong confirmatory evidence of the presence of disease loci in these regions

    The association of polymorphisms in hormone metabolism pathway genes, menopausal hormone therapy, and breast cancer risk: a nested case-control study in the California Teachers Study cohort

    Get PDF
    Abstract Introduction The female sex steroids estrogen and progesterone are important in breast cancer etiology. It therefore seems plausible that variation in genes involved in metabolism of these hormones may affect breast cancer risk, and that these associations may vary depending on menopausal status and use of hormone therapy. Methods We conducted a nested case-control study of breast cancer in the California Teachers Study cohort. We analyzed 317 tagging single nucleotide polymorphisms (SNPs) in 24 hormone pathway genes in 2746 non-Hispanic white women: 1351 cases and 1395 controls. Odds ratios (ORs) and 95% confidence intervals (CIs) were estimated by fitting conditional logistic regression models using all women or subgroups of women defined by menopausal status and hormone therapy use. P values were adjusted for multiple correlated tests (P ACT). Results The strongest associations were observed for SNPs in SLCO1B1, a solute carrier organic anion transporter gene, which transports estradiol-17β-glucuronide and estrone-3-sulfate from the blood into hepatocytes. Ten of 38 tagging SNPs of SLCO1B1 showed significant associations with postmenopausal breast cancer risk; 5 SNPs (rs11045777, rs11045773, rs16923519, rs4149057, rs11045884) remained statistically significant after adjusting for multiple testing within this gene (P ACT = 0.019-0.046). In postmenopausal women who were using combined estrogen-progestin therapy (EPT) at cohort enrollment, the OR of breast cancer was 2.31 (95% CI = 1.47-3.62) per minor allele of rs4149013 in SLCO1B1 (P = 0.0003; within-gene P ACT = 0.002; overall P ACT = 0.023). SNPs in other hormone pathway genes evaluated in this study were not associated with breast cancer risk in premenopausal or postmenopausal women. Conclusions We found evidence that genetic variation in SLCO1B1 is associated with breast cancer risk in postmenopausal women, particularly among those using EPT

    A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk

    Full text link
    Colorectal cancer risk can be impacted by genetic, environmental, and lifestyle factors, including diet and obesity. Geneenvironment interactions (G x E) can provide biological insights into the effects of obesity on colorectal cancer risk. Here, we assessed potential genome-wide G x E interactions between body mass index (BMI) and common SNPs for colorectal cancer risk using data from 36,415 colorectal cancer cases and 48,451 controls from three international colorectal cancer consortia (CCFR, CORECT, and GECCO). The G x E tests included the conventional logistic regression using multiplicative terms (one degree of freedom, 1DF test), the two-step EDGE method, and the joint 3DF test, each of which is powerful for detecting G x E interactions under specific conditions. BMI was associated with higher colorectal cancer risk. The two-step approach revealed a statistically significant GxBMI interaction located within the Formin 1/Gremlin 1 (FMN1/GREM1) gene region (rs58349661). This SNP was also identified by the 3DF test, with a suggestive statistical significance in the 1DF test. Among participants with the CC genotype of rs58349661, overweight and obesity categories were associated with higher colorectal cancer risk, whereas null associations were observed across BMI categories in those with the TT genotype. Using data from three large international consortia, this study discovered a locus in the FMN1/GREM1 gene region that interacts with BMI on the association with colorectal cancer risk. Further studies should examine the potential mechanisms through which this locus modifies the etiologic link between obesity and colorectal cancer

    Data integration by multi-tuning parameter elastic net regression

    No full text
    Abstract Background To integrate molecular features from multiple high-throughput platforms in prediction, a regression model that penalizes features from all platforms equally is commonly used. However, data from different platforms are likely to differ in effect sizes, the proportion of predictive features, and correlations structures. Subtle but important features may be missed by shrinking all features equally. Results We propose an Elastic net (EN) model with separate tuning parameter penalties for each platform that is fit using standard software. In a comprehensive simulation study, we evaluated the performance of EN logistic regression with multiple tuning penalties. We found that when the number of informative features differs among the platforms, and when there is no notable correlation between the features from different platforms, the multi-tuning parameter EN yields more predictive models. Moreover, the multi-tuning parameter EN is robust, in the sense that there is no loss of predictivity relative to a single tuning parameter EN when features across all platforms have similar effects. We also investigated the performance of multi-tuning parameter EN using real cancer datasets. Conclusion The proposed multi-tuning parameter EN model, fit using standard penalized regression software, can achieve better prediction in sample classification when integrating multiple genomic platforms, compared to the traditional method where a single penalty parameter is used for all features in different platforms

    PEACOCK: a machine learning approach to assess the validity of cell type-specific enhancer-gene regulatory relationships

    No full text
    Abstract The vast majority of disease-associated variants identified in genome-wide association studies map to enhancers, powerful regulatory elements which orchestrate the recruitment of transcriptional complexes to their target genes’ promoters to upregulate transcription in a cell type- and timing-dependent manner. These variants have implicated thousands of enhancers in many common genetic diseases, including nearly all cancers. However, the etiology of most of these diseases remains unknown because the regulatory target genes of the vast majority of enhancers are unknown. Thus, identifying the target genes of as many enhancers as possible is crucial for learning how enhancer regulatory activities function and contribute to disease. Based on experimental results curated from scientific publications coupled with machine learning methods, we developed a cell type-specific score predictive of an enhancer targeting a gene. We computed the score genome-wide for every possible cis enhancer-gene pair and validated its predictive ability in four widely used cell lines. Using a pooled final model trained across multiple cell types, all possible gene-enhancer regulatory links in cis (~17 M) were scored and added to the publicly available PEREGRINE database ( www.peregrineproj.org ). These scores provide a quantitative framework for the enhancer-gene regulatory prediction that can be incorporated into downstream statistical analyses
    • …
    corecore