51 research outputs found

    The cost of large numbers of hypothesis tests on power, effect size and sample size

    Get PDF
    Advances in high-throughput biology and computer science are driving an exponential increase in the number of hypothesis tests in genomics and other scientific disciplines. Studies using current genotyping platforms frequently include a million or more tests. In addition to the monetary cost, this increase imposes a statistical cost owing to the multiple testing corrections needed to avoid large numbers of false-positive results. To safeguard against the resulting loss of power, some have suggested sample sizes on the order of tens of thousands that can be impractical for many diseases or may lower the quality of phenotypic measurements. This study examines the relationship between the number of tests on the one hand and power, detectable effect size or required sample size on the other. We show that once the number of tests is large, power can be maintained at a constant level, with comparatively small increases in the effect size or sample size. For example at the 0.05 significance level, a 13% increase in sample size is needed to maintain 80% power for ten million tests compared with one million tests, whereas a 70% increase in sample size is needed for 10 tests compared with a single test. Relative costs are less when measured by increases in the detectable effect size. We provide an interactive Excel calculator to compute power, effect size or sample size when comparing study designs or genome platforms involving different numbers of hypothesis tests. The results are reassuring in an era of extreme multiple testing

    Genetic Variation in Selenoprotein Genes, Lifestyle, and Risk of Colon and Rectal Cancer

    Get PDF
    BACKGROUND: Associations between selenium and cancer have directed attention to role of selenoproteins in the carcinogenic process. METHODS: We used data from two population-based case-control studies of colon (n = 1555 cases, 1956 controls) and rectal (n = 754 cases, 959 controls) cancer. We evaluated the association between genetic variation in TXNRD1, TXNRD2, TXNRD3, C11orf31 (SelH), SelW, SelN1, SelS, SepX, and SeP15 with colorectal cancer risk. RESULTS: After adjustment for multiple comparisons, several associations were observed. Two SNPs in TXNRD3 were associated with rectal cancer (rs11718498 dominant OR 1.42 95% CI 1.16,1.74 pACT 0.0036 and rs9637365 recessive 0.70 95% CI 0.55,0.90 pACT 0.0208). Four SNPs in SepN1 were associated with rectal cancer (rs11247735 recessive OR 1.30 95% CI 1.04,1.63 pACT 0.0410; rs2072749 GGvsAA OR 0.53 95% CI 0.36,0.80 pACT 0.0159; rs4659382 recessive OR 0.58 95% CI 0.39,0.86 pACT 0.0247; rs718391 dominant OR 0.76 95% CI 0.62,0.94 pACT 0.0300). Interaction between these genes and exposures that could influence these genes showed numerous significant associations after adjustment for multiple comparisons. Two SNPs in TXNRD1 and four SNPs in TXNRD2 interacted with aspirin/NSAID to influence colon cancer; one SNP in TXNRD1, two SNPs in TXNRD2, and one SNP in TXNRD3 interacted with aspirin/NSAIDs to influence rectal cancer. Five SNPs in TXNRD2 and one in SelS, SeP15, and SelW1 interacted with estrogen to modify colon cancer risk; one SNP in SelW1 interacted with estrogen to alter rectal cancer risk. Several SNPs in this candidate pathway influenced survival after diagnosis with colon cancer (SeP15 and SepX1 increased HRR) and rectal cancer (SepX1 increased HRR). CONCLUSIONS: Findings support an association between selenoprotein genes and colon and rectal cancer development and survival after diagnosis. Given the interactions observed, it is likely that the impact of cancer susceptibility from genotype is modified by lifestyle

    Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers

    Get PDF
    With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu

    Common Polymorphisms in MTNR1B, G6PC2 and GCK Are Associated with Increased Fasting Plasma Glucose and Impaired Beta-Cell Function in Chinese Subjects

    Get PDF
    BACKGROUND: Previous studies identified melatonin receptor 1B (MTNR1B), islet-specific glucose 6 phosphatase catalytic subunit-related protein (G6PC2), glucokinase (GCK) and glucokinase regulatory protein (GCKR) as candidate genes for type 2 diabetes (T2D) acting through elevated fasting plasma glucose (FPG). We examined the associations of the reported common variants of these genes with T2D and glucose homeostasis in three independent Chinese cohorts. METHODOLOGY/PRINCIPAL FINDINGS: Five single nucleotide polymorphisms (SNPs), MTNR1B rs10830963, G6PC2 rs16856187 and rs478333, GCK rs1799884 and GCKR rs780094, were genotyped in 1644 controls (583 adults and 1061 adolescents) and 1342 T2D patients. The G-allele of MTNR1B rs10830963 and the C-alleles of both G6PC2 rs16856187 and rs478333 were associated with higher FPG (0.0034<P<6.6x10(-5)) in healthy controls. In addition to our previous report for association with FPG, the A-allele of GCK rs1799884 was also associated with reduced homeostasis model assessment of beta-cell function (HOMA-B) (P=0.0015). Together with GCKR rs780094, the risk alleles of these SNPs exhibited dosage effect in their associations with increased FPG (P=2.9x10(-9)) and reduced HOMA-B (P=1.1x10(-3)). Meta-analyses strongly supported additive effects of MTNR1B rs10830963 and G6PC2 rs16856187 on FPG. CONCLUSIONS/SIGNIFICANCE: Common variants of MTNR1B, G6PC2 and GCK are associated with elevated FPG and impaired insulin secretion, both individually and jointly, suggesting that these risk alleles may precipitate or perpetuate hyperglycemia in predisposed individuals

    DNA methylation signature of chronic low-grade inflammation and its role in cardio-respiratory diseases

    Get PDF
    We performed a multi-ethnic Epigenome Wide Association study on 22,774 individuals to describe the DNA methylation signature of chronic low-grade inflammation as measured by C-Reactive protein (CRP). We find 1,511 independent differentially methylated loci associated with CRP. These CpG sites show correlation structures across chromosomes, and are primarily situated in euchromatin, depleted in CpG islands. These genomic loci are predominantly situated in transcription factor binding sites and genomic enhancer regions. Mendelian randomization analysis suggests altered CpG methylation is a consequence of increased blood CRP levels. Mediation analysis reveals obesity and smoking as important underlying driving factors for changed CpG methylation. Finally, we find that an activated CpG signature significantly increases the risk for cardiometabolic diseases and COPD

    The association of polymorphisms in hormone metabolism pathway genes, menopausal hormone therapy, and breast cancer risk: a nested case-control study in the California Teachers Study cohort

    Get PDF
    Abstract Introduction The female sex steroids estrogen and progesterone are important in breast cancer etiology. It therefore seems plausible that variation in genes involved in metabolism of these hormones may affect breast cancer risk, and that these associations may vary depending on menopausal status and use of hormone therapy. Methods We conducted a nested case-control study of breast cancer in the California Teachers Study cohort. We analyzed 317 tagging single nucleotide polymorphisms (SNPs) in 24 hormone pathway genes in 2746 non-Hispanic white women: 1351 cases and 1395 controls. Odds ratios (ORs) and 95% confidence intervals (CIs) were estimated by fitting conditional logistic regression models using all women or subgroups of women defined by menopausal status and hormone therapy use. P values were adjusted for multiple correlated tests (P ACT). Results The strongest associations were observed for SNPs in SLCO1B1, a solute carrier organic anion transporter gene, which transports estradiol-17β-glucuronide and estrone-3-sulfate from the blood into hepatocytes. Ten of 38 tagging SNPs of SLCO1B1 showed significant associations with postmenopausal breast cancer risk; 5 SNPs (rs11045777, rs11045773, rs16923519, rs4149057, rs11045884) remained statistically significant after adjusting for multiple testing within this gene (P ACT = 0.019-0.046). In postmenopausal women who were using combined estrogen-progestin therapy (EPT) at cohort enrollment, the OR of breast cancer was 2.31 (95% CI = 1.47-3.62) per minor allele of rs4149013 in SLCO1B1 (P = 0.0003; within-gene P ACT = 0.002; overall P ACT = 0.023). SNPs in other hormone pathway genes evaluated in this study were not associated with breast cancer risk in premenopausal or postmenopausal women. Conclusions We found evidence that genetic variation in SLCO1B1 is associated with breast cancer risk in postmenopausal women, particularly among those using EPT

    Genome-wide association studies identify 137 genetic loci for DNA methylation biomarkers of aging

    Get PDF
    BACKGROUND: Biological aging estimators derived from DNA methylation data are heritable and correlate with morbidity and mortality. Consequently, identification of genetic and environmental contributors to the variation in these measures in populations has become a major goal in the field. RESULTS: Leveraging DNA methylation and SNP data from more than 40,000 individuals, we identify 137 genome-wide significant loci, of which 113 are novel, from genome-wide association study (GWAS) meta-analyses of four epigenetic clocks and epigenetic surrogate markers for granulocyte proportions and plasminogen activator inhibitor 1 levels, respectively. We find evidence for shared genetic loci associated with the Horvath clock and expression of transcripts encoding genes linked to lipid metabolism and immune function. Notably, these loci are independent of those reported to regulate DNA methylation levels at constituent clock CpGs. A polygenic score for GrimAge acceleration showed strong associations with adiposity-related traits, educational attainment, parental longevity, and C-reactive protein levels. CONCLUSION: This study illuminates the genetic architecture underlying epigenetic aging and its shared genetic contributions with lifestyle factors and longevity

    Multiple testing correction in linear mixed models

    Get PDF
    BACKGROUND: Multiple hypothesis testing is a major issue in genome-wide association studies (GWAS), which often analyze millions of markers. The permutation test is considered to be the gold standard in multiple testing correction as it accurately takes into account the correlation structure of the genome. Recently, the linear mixed model (LMM) has become the standard practice in GWAS, addressing issues of population structure and insufficient power. However, none of the current multiple testing approaches are applicable to LMM. RESULTS: We were able to estimate per-marker thresholds as accurately as the gold standard approach in real and simulated datasets, while reducing the time required from months to hours. We applied our approach to mouse, yeast, and human datasets to demonstrate the accuracy and efficiency of our approach. CONCLUSIONS: We provide an efficient and accurate multiple testing correction approach for linear mixed models. We further provide an intuition about the relationships between per-marker threshold, genetic relatedness, and heritability, based on our observations in real data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-016-0903-6) contains supplementary material, which is available to authorized users

    Epigenetic Signatures of Cigarette Smoking

    Get PDF
    BACKGROUND: DNA methylation leaves a long-term signature of smoking exposure and is one potential mechanism by which tobacco exposure predisposes to adverse health outcomes, such as cancers, osteoporosis, lung, and cardiovascular disorders. METHODS AND RESULTS: To comprehensively determine the association between cigarette smoking and DNA methylation, we conducted a meta-analysis of genome-wide DNA methylation assessed using the Illumina BeadChip 450K array on 15 907 blood-derived DNA samples from participants in 16 cohorts (including 2433 current, 6518 former, and 6956 never smokers). Comparing current versus never smokers, 2623 cytosine-phosphate-guanine sites (CpGs), annotated to 1405 genes, were statistically significantly differentially methylated at Bonferroni threshold of P<1×107^{-7} (18 760 CpGs at false discovery rate <0.05). Genes annotated to these CpGs were enriched for associations with several smoking-related traits in genome-wide studies including pulmonary function, cancers, inflammatory diseases, and heart disease. Comparing former versus never smokers, 185 of the CpGs that differed between current and never smokers were significant P<1×107^{-7} (2623 CpGs at false discovery rate <0.05), indicating a pattern of persistent altered methylation, with attenuation, after smoking cessation. Transcriptomic integration identified effects on gene expression at many differentially methylated CpGs. CONCLUSIONS: Cigarette smoking has a broad impact on genome-wide methylation that, at many loci, persists many years after smoking cessation. Many of the differentially methylated genes were novel genes with respect to biological effects of smoking and might represent therapeutic targets for prevention or treatment of tobacco-related diseases. Methylation at these sites could also serve as sensitive and stable biomarkers of lifetime exposure to tobacco smoke.Biotechnology and Biological Sciences Research Council, British Heart Foundation, Cancer Research UK, Medical Research Council, National Institutes of Health, Royal Society, Wellcome Trus
    corecore