151 research outputs found
Polyploidization increases meiotic recombination frequency in Arabidopsis: a close look at statistical modeling and data analysis
This paper is a response to Pecinka A, Fang W, Rehmsmeier M, Levy AA, Mittelsten Scheid, O: Polyploidization increases meiotic recombination frequency in Arabidopsis. BMC Biology 2011, 9:24
A robust and efficient statistical method for genetic association study using case and control samples from multiple cohorts
BACKGROUND: The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of caseācontrol samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. RESULTS: We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinsonās disease (PD) caseācontrol samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of sizeā<ā1Ā Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18Ā Mb and 0.18Ā Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk. CONCLUSIONS: We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS
A highly robust and optimized sequence-based approach for genetic polymorphism discovery and genotyping in large plant populations
KEY MESSAGE: This optimized approach provides both a computational tool and a library construction protocol, which can maximize the number of genomic sequence reads that uniformly cover a plant genome and minimize the number of sequence reads representing chloroplast DNA and rRNA genes. One can implement the developed computational tool to feasibly design their own RAD-seq experiment to achieve expected coverage of sequence variant markers for large plant populations using information of the genome sequence and ideally, though not necessarily, information of the sequence polymorphism distribution in the genome. ABSTRACT: Advent of the next generation sequencing techniques motivates recent interest in developing sequence-based identification and genotyping of genome-wide genetic variants in large populations, with RAD-seq being a typical example. Without taking proper account for the fact that chloroplast and rRNA genes may occupy up to 60Ā % of the resulting sequence reads, the current RAD-seq design could be very inefficient for plant and crop species. We presented here a generic computational tool to optimize RAD-seq design in any plant species and experimentally tested the optimized design by implementing it to screen for and genotype sequence variants in four plant populations of diploid and autotetraploid Arabidopsis and potato Solanum tuberosum. Sequence data from the optimized RAD-seq experiments shows that the undesirable chloroplast and rRNA contributed sequence reads can be controlled at 3ā10Ā %. Additionally, the optimized RAD-seq method enables pre-design of the required uniformity and density in coverage of the high quality sequence polymorphic markers over the genome of interest and genotyping of large plant or crop populations at a competitive cost in comparison to other mainstream rivals in the literature. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00122-016-2736-9) contains supplementary material, which is available to authorized users
Methods for evaluating gene expression from Affymetrix microarray datasets
<p>Abstract</p> <p>Background</p> <p>Affymetrix high density oligonucleotide expression arrays are widely used across all fields of biological research for measuring genome-wide gene expression. An important step in processing oligonucleotide microarray data is to produce a single value for the gene expression level of an RNA transcript using one of a growing number of statistical methods. The challenge for the researcher is to decide on the most appropriate method to use to address a specific biological question with a given dataset. Although several research efforts have focused on assessing performance of a few methods in evaluating gene expression from RNA hybridization experiments with different datasets, the relative merits of the methods currently available in the literature for evaluating genome-wide gene expression from Affymetrix microarray data collected from real biological experiments remain actively debated.</p> <p>Results</p> <p>The present study reports a comprehensive survey of the performance of all seven commonly used methods in evaluating genome-wide gene expression from a well-designed experiment using Affymetrix microarrays. The experiment profiled eight genetically divergent barley cultivars each with three biological replicates. The dataset so obtained confers a balanced and idealized structure for the present analysis. The methods were evaluated on their sensitivity for detecting differentially expressed genes, reproducibility of expression values across replicates, and consistency in calling differentially expressed genes. The number of genes detected as differentially expressed among methods differed by a factor of two or more at a given false discovery rate (FDR) level. Moreover, we propose the use of genes containing single feature polymorphisms (SFPs) as an empirical test for comparison among methods for the ability to detect true differential gene expression on the basis that SFPs largely correspond to <it>cis</it>-acting expression regulators. The PDNN method demonstrated superiority over all other methods in every comparison, whilst the default Affymetrix MAS5.0 method was clearly inferior.</p> <p>Conclusion</p> <p>A comprehensive assessment of seven commonly used data extraction methods based on an extensive barley Affymetrix gene expression dataset has shown that the PDNN method has superior performance for the detection of differentially expressed genes.</p
Conserved and divergent patterns of DNA methylation in higher vertebrates
DNA methylation in the genome plays a fundamental role in the regulation of gene expression and is widespread in the genome of eukaryotic species. For example, in higher vertebrates, there is a āglobalā methylation pattern involving complete methylation of CpG sites genome-wide, except in promoter regions that are typically enriched for CpG dinucleotides, or so called āCpG islands.ā Here, we comprehensively examined and compared the distribution of CpG sites within ten model eukaryotic species and linked the observed patterns to the role of DNA methylation in controlling gene transcription. The analysis revealed two distinct but conserved methylation patterns for gene promoters in human and mouse genomes, involving genes with distinct distributions of promoter CpGs and gene expression patterns. Comparative analysis with four other higher vertebrates revealed that the primary regulatory role of the DNA methylation system is highly conserved in higher vertebrates
Genome-wide eQTLs and heritability for gene expression traits in unrelated individuals
BACKGROUND: While the possible sources underlying the so-called āmissing heritabilityā evident in current genome-wide association studies (GWAS) of complex traits have been actively pursued in recent years, resolving this mystery remains a challenging task. Studying heritability of genome-wide gene expression traits can shed light on the goal of understanding the relationship between phenotype and genotype. Here we used microarray gene expression measurements of lymphoblastoid cell lines and genome-wide SNP genotype data from 210 HapMap individuals to examine the heritability of gene expression traits. RESULTS: Heritability levels for expression of 10,720 genes were estimated by applying variance component model analyses and 1,043 expression quantitative loci (eQTLs) were detected. Our results indicate that gene expression traits display a bimodal distribution of heritability, one peak close to 0% and the other summit approaching 100%. Such a pattern of the within-population variability of gene expression heritability is common among different HapMap populations of unrelated individuals but different from that obtained in the CEU and YRI trio samples. Higher heritability levels are shown by housekeeping genes and genes associated with cis eQTLs. Both cis and trans eQTLs make comparable cumulative contributions to the heritability. Finally, we modelled gene-gene interactions (epistasis) for genes with multiple eQTLs and revealed that epistasis was not prevailing in all genes but made a substantial contribution in explaining total heritability for some genes analysed. CONCLUSIONS: We utilised a mixed effect model analysis for estimating genetic components from population based samples. On basis of analyses of genome-wide gene expression from four HapMap populations, we demonstrated detailed exploitation of the distribution of genetic heritabilities for expression traits from different populations, and highlighted the importance of studying interaction at the gene expression level as an important source of variation underlying missing heritability. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-13) contains supplementary material, which is available to authorized users
- ā¦