76 research outputs found

    The cost of large numbers of hypothesis tests on power, effect size and sample size

    Get PDF
    Advances in high-throughput biology and computer science are driving an exponential increase in the number of hypothesis tests in genomics and other scientific disciplines. Studies using current genotyping platforms frequently include a million or more tests. In addition to the monetary cost, this increase imposes a statistical cost owing to the multiple testing corrections needed to avoid large numbers of false-positive results. To safeguard against the resulting loss of power, some have suggested sample sizes on the order of tens of thousands that can be impractical for many diseases or may lower the quality of phenotypic measurements. This study examines the relationship between the number of tests on the one hand and power, detectable effect size or required sample size on the other. We show that once the number of tests is large, power can be maintained at a constant level, with comparatively small increases in the effect size or sample size. For example at the 0.05 significance level, a 13% increase in sample size is needed to maintain 80% power for ten million tests compared with one million tests, whereas a 70% increase in sample size is needed for 10 tests compared with a single test. Relative costs are less when measured by increases in the detectable effect size. We provide an interactive Excel calculator to compute power, effect size or sample size when comparing study designs or genome platforms involving different numbers of hypothesis tests. The results are reassuring in an era of extreme multiple testing

    A First Generation Microsatellite- and SNP-Based Linkage Map of Jatropha

    Get PDF
    Jatropha curcas is a potential plant species for biodiesel production. However, its seed yield is too low for profitable production of biodiesel. To improve the productivity, genetic improvement through breeding is essential. A linkage map is an important component in molecular breeding. We established a first-generation linkage map using a mapping panel containing two backcross populations with 93 progeny. We mapped 506 markers (216 microsatellites and 290 SNPs from ESTs) onto 11 linkage groups. The total length of the map was 1440.9 cM with an average marker space of 2.8 cM. Blasting of 222 Jatropha ESTs containing polymorphic SSR or SNP markers against EST-databases revealed that 91.0%, 86.5% and 79.2% of Jatropha ESTs were homologous to counterparts in castor bean, poplar and Arabidopsis respectively. Mapping 192 orthologous markers to the assembled whole genome sequence of Arabidopsis thaliana identified 38 syntenic blocks and revealed that small linkage blocks were well conserved, but often shuffled. The first generation linkage map and the data of comparative mapping could lay a solid foundation for QTL mapping of agronomic traits, marker-assisted breeding and cloning genes responsible for phenotypic variation

    phenosim - A software to simulate phenotypes for testing in genome-wide association studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There is a great interest in understanding the genetic architecture of complex traits in natural populations. Genome-wide association studies (GWAS) are becoming routine in human, animal and plant genetics to understand the connection between naturally occurring genotypic and phenotypic variation. Coalescent simulations are commonly used in population genetics to simulate genotypes under different parameters and demographic models.</p> <p>Results</p> <p>Here, we present <monospace>phenosim</monospace>, a software to add a phenotype to genotypes generated in time-efficient coalescent simulations. Both qualitative and quantitative phenotypes can be generated and it is possible to partition phenotypic variation between additive effects and epistatic interactions between causal variants. The output formats of <monospace>phenosim</monospace> are directly usable as input for different GWAS tools. The applicability of <monospace>phenosim</monospace> is shown by simulating a genome-wide association study in <it>Arabidopsis thaliana</it>.</p> <p>Conclusions</p> <p>By using the coalescent approach to generate genotypes and <monospace>phenosim</monospace> to add phenotypes, the data sets can be used to assess the influence of various factors such as demography, genetic architecture or selection on the statistical power of association methods to detect causal genetic variants under a wide variety of population genetic scenarios. <monospace>phenosim</monospace> is freely available from the authors' website <url>http://evoplant.uni-hohenheim.de</url></p

    Gene-Centric Characteristics of Genome-Wide Association Studies

    Get PDF
    BACKGROUND: The high-throughput genotyping chips have contributed greatly to genome-wide association (GWA) studies to identify novel disease susceptibility single nucleotide polymorphisms (SNPs). The high-density chips are designed using two different SNP selection approaches, the direct gene-centric approach, and the indirect quasi-random SNPs or linkage disequilibrium (LD)-based tagSNPs approaches. Although all these approaches can provide high genome coverage and ascertain variants in genes, it is not clear to which extent these approaches could capture the common genic variants. It is also important to characterize and compare the differences between these approaches. METHODOLOGY/PRINCIPAL FINDINGS: In our study, by using both the Phase II HapMap data and the disease variants extracted from OMIM, a gene-centric evaluation was first performed to evaluate the ability of the approaches in capturing the disease variants in Caucasian population. Then the distribution patterns of SNPs were also characterized in genic regions, evolutionarily conserved introns and nongenic regions, ontologies and pathways. The results show that, no mater which SNP selection approach is used, the current high-density SNP chips provide very high coverage in genic regions and can capture most of known common disease variants under HapMap frame. The results also show that the differences between the direct and the indirect approaches are relatively small. Both have similar SNP distribution patterns in these gene-centric characteristics. CONCLUSIONS/SIGNIFICANCE: This study suggests that the indirect approaches not only have the advantage of high coverage but also are useful for studies focusing on various functional SNPs either in genes or in the conserved regions that the direct approach supports. The study and the annotation of characteristics will be helpful for designing and analyzing GWA studies that aim to identify genetic risk factors involved in common diseases, especially variants in genes and conserved regions

    Disease-associated alleles in genome-wide association studies are enriched for derived low frequency alleles relative to HapMap and neutral expectations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide association studies give insight into the genetic basis of common diseases. An open question is whether the allele frequency distributions and ancestral vs. derived states of disease-associated alleles differ from the rest of the genome. Characteristics of disease-associated alleles can be used to increase the yield of future studies.</p> <p>Methods</p> <p>The set of all common disease-associated alleles found in genome-wide association studies prior to January 2010 was analyzed and compared with HapMap and theoretical null expectations. In addition, allele frequency distributions of different disease classes were assessed. Ages of HapMap and disease-associated alleles were also estimated.</p> <p>Results</p> <p>The allele frequency distribution of HapMap alleles was qualitatively similar to neutral expectations. However, disease-associated alleles were more likely to be low frequency derived alleles relative to null expectations. 43.7% of disease-associated alleles were ancestral alleles. The mean frequency of disease-associated alleles was less than randomly chosen CEU HapMap alleles (0.394 vs. 0.610, after accounting for probability of detection). Similar patterns were observed for the subset of disease-associated alleles that have been verified in multiple studies. SNPs implicated in genome-wide association studies were enriched for young SNPs compared to randomly selected HapMap loci. Odds ratios of disease-associated alleles tended to be less than 1.5 and varied by frequency, confirming previous studies.</p> <p>Conclusions</p> <p>Alleles associated with genetic disease differ from randomly selected HapMap alleles and neutral expectations. The evolutionary history of alleles (frequency and ancestral vs. derived state) influences whether they are implicated in genome-wide assocation studies.</p

    Genome-Wide Analysis Reveals a Complex Pattern of Genomic Imprinting in Mice

    Get PDF
    Parent-of-origin–dependent gene expression resulting from genomic imprinting plays an important role in modulating complex traits ranging from developmental processes to cognitive abilities and associated disorders. However, while gene-targeting techniques have allowed for the identification of imprinted loci, very little is known about the contribution of imprinting to quantitative variation in complex traits. Most studies, furthermore, assume a simple pattern of imprinting, resulting in either paternal or maternal gene expression; yet, more complex patterns of effects also exist. As a result, the distribution and number of different imprinting patterns across the genome remain largely unexplored. We address these unresolved issues using a genome-wide scan for imprinted quantitative trait loci (iQTL) affecting body weight and growth in mice using a novel three-generation design. We identified ten iQTL that display much more complex and diverse effect patterns than previously assumed, including four loci with effects similar to the callipyge mutation found in sheep. Three loci display a new phenotypic pattern that we refer to as bipolar dominance, where the two heterozygotes are different from each other while the two homozygotes are identical to each other. Our study furthermore detected a paternally expressed iQTL on Chromosome 7 in a region containing a known imprinting cluster with many paternally expressed genes. Surprisingly, the effects of the iQTL were mostly restricted to traits expressed after weaning. Our results imply that the quantitative effects of an imprinted allele at a locus depend both on its parent of origin and the allele it is paired with. Our findings also show that the imprinting pattern of a locus can be variable over ontogenetic time and, in contrast to current views, may often be stronger at later stages in life

    The Complexity of Vascular and Non-Vascular Complications of Diabetes: The Hong Kong Diabetes Registry

    Get PDF
    Diabetes is a complex disease characterized by chronic hyperglycemia and multiple phenotypes. In 1995, we used a doctor-nurse-clerk team and structured protocol to establish the Hong Kong Diabetes Registry in a quality improvement program. By 2009, we had accrued 2616 clinical events in 9588 Chinese type 2 diabetic patients with a follow-up duration of 6 years. The detailed phenotypes at enrollment and follow-up medications have allowed us to develop a series of risk equations to predict multiple endpoints with high sensitivity and specificity. In this prospective database, we were able to validate findings from clinical trials in real practice, confirm close links between cardiovascular and renal disease, and demonstrate the emerging importance of cancer as a leading cause of death. In addition to serving as a tool for risk stratification and quality assurance, ongoing data analysis of the registry also reveals secular changes in disease patterns and identifies unmet needs

    TNF-α is involved in activating DNA fragmentation in skeletal muscle

    Get PDF
    Intraperitoneal administration of 100 μg kg−1 (body weight) of tumour necrosis factor-α to rats for 8 consecutive days resulted in a significant decrease in protein content, which was concomitant with a reduction in DNA content. Interestingly, the protein/DNA ratio was unchanged in the skeletal muscle of the tumour necrosis factor-α-treated animals as compared with the non-treated controls. Analysis of muscle DNA fragmentation clearly showed enhanced laddering in the skeletal muscle of tumour necrosis factor-α-treated animals, suggesting an apoptotic phenomenon. In a different set of experiments, mice bearing a cachexia-inducing tumour (the Lewis lung carcinoma) showed an increase in muscle DNA fragmentation (9.8-fold) as compared with their non-tumour-bearing control counterparts as previously described. When gene-deficient mice for tumour necrosis factor-α receptor protein I were inoculated with Lewis lung carcinoma, they were also affected by DNA fragmentation; however the increase was only 2.1-fold. These results suggest that tumour necrosis factor-α partly mediates DNA fragmentation during experimental cancer-associated cachexia
    corecore