97 research outputs found

    Genotype calling in tetraploid species from bi-allelic marker data using mixture models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Automated genotype calling in tetraploid species was until recently not possible, which hampered genetic analysis. Modern genotyping assays often produce two signals, one for each allele of a bi-allelic marker. While ample software is available to obtain genotypes (homozygous for either allele, or heterozygous) for diploid species from these signals, such software is not available for tetraploid species which may be scored as five alternative genotypes (aaaa, baaa, bbaa, bbba and bbbb; nulliplex to quadruplex).</p> <p>Results</p> <p>We present a novel algorithm, implemented in the R package fitTetra, to assign genotypes for bi-allelic markers to tetraploid samples from genotyping assays that produce intensity signals for both alleles. The algorithm is based on the fitting of several mixture models with five components, one for each of the five possible genotypes. The models have different numbers of parameters specifying the relation between the five component means, and some of them impose a constraint on the mixing proportions to conform to Hardy-Weinberg equilibrium (HWE) ratios. The software rejects markers that do not allow a reliable genotyping for the majority of the samples, and it assigns a missing score to samples that cannot be scored into one of the five possible genotypes with sufficient confidence.</p> <p>Conclusions</p> <p>We have validated the software with data of a collection of 224 potato varieties assayed with an Illumina GoldenGate™ 384 SNP array and shown that all SNPs with informative ratio distributions are fitted. Almost all fitted models appear to be correct based on visual inspection and comparison with diploid samples. When the collection of potato varieties is analyzed as if it were a population, almost all markers seem to be in Hardy-Weinberg equilibrium. The R package fitTetra is freely available under the GNU Public License from <url>http://www.plantbreeding.wur.nl/UK/software_fitTetra.html</url> and as Additional files with this article.</p

    Another tool in the genome-wide association study arsenal: population-based detection of somatic gene conversion

    Get PDF
    The hunt for the genetic contributors to complex disease has used a number of strategies, resulting in the identification of variants associated with many of the common diseases affecting society. However most of the genetic variants detected to date are single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) and fall far short of explaining the full genetic component of any given disease. An as yet untapped genomic mechanism is somatic gene conversion and deletion, which could be complicit in disease risk but has been challenging to detect in genome-wide datasets. In a recent publication in BMC Medicine by Kenneth Ross, the author uses existing datasets to look at somatic gene conversion and deletion in human disease. Here, we describe how Ross's recent efforts to detect such occurrences could impact the field going forward

    SAQC: SNP Array Quality Control

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide single-nucleotide polymorphism (SNP) arrays containing hundreds of thousands of SNPs from the human genome have proven useful for studying important human genome questions. Data quality of SNP arrays plays a key role in the accuracy and precision of downstream data analyses. However, good indices for assessing data quality of SNP arrays have not yet been developed.</p> <p>Results</p> <p>We developed new quality indices to measure the quality of SNP arrays and/or DNA samples and investigated their statistical properties. The indices quantify a departure of estimated individual-level allele frequencies (AFs) from expected frequencies via standardized distances. The proposed quality indices followed lognormal distributions in several large genomic studies that we empirically evaluated. AF reference data and quality index reference data for different SNP array platforms were established based on samples from various reference populations. Furthermore, a confidence interval method based on the underlying empirical distributions of quality indices was developed to identify poor-quality SNP arrays and/or DNA samples. Analyses of authentic biological data and simulated data show that this new method is sensitive and specific for the detection of poor-quality SNP arrays and/or DNA samples.</p> <p>Conclusions</p> <p>This study introduces new quality indices, establishes references for AFs and quality indices, and develops a detection method for poor-quality SNP arrays and/or DNA samples. We have developed a new computer program that utilizes these methods called SNP Array Quality Control (SAQC). SAQC software is written in R and R-GUI and was developed as a user-friendly tool for the visualization and evaluation of data quality of genome-wide SNP arrays. The program is available online (<url>http://www.stat.sinica.edu.tw/hsinchou/genetics/quality/SAQC.htm</url>).</p

    Estimates of array and pool-construction variance for planning efficient DNA-pooling genome wide association studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Until recently, genome-wide association studies (GWAS) have been restricted to research groups with the budget necessary to genotype hundreds, if not thousands, of samples. Replacing individual genotyping with genotyping of DNA pools in Phase I of a GWAS has proven successful, and dramatically altered the financial feasibility of this approach. When conducting a pool-based GWAS, how well SNP allele frequency is estimated from a DNA pool will influence a study's power to detect associations. Here we address how to control the variance in allele frequency estimation when DNAs are pooled, and how to plan and conduct the most efficient well-powered pool-based GWAS.</p> <p>Methods</p> <p>By examining the variation in allele frequency estimation on SNP arrays between and within DNA pools we determine how array variance [var(e<sub>array</sub>)] and pool-construction variance [var(e<sub>construction</sub>)] contribute to the total variance of allele frequency estimation. This information is useful in deciding whether replicate arrays or replicate pools are most useful in reducing variance. Our analysis is based on 27 DNA pools ranging in size from 74 to 446 individual samples, genotyped on a collective total of 128 Illumina beadarrays: 24 1M-Single, 32 1M-Duo, and 72 660-Quad.</p> <p>Results</p> <p>For all three Illumina SNP array types our estimates of var(e<sub>array</sub>) were similar, between 3-4 × 10<sup>-4 </sup>for normalized data. Var(e<sub>construction</sub>) accounted for between 20-40% of pooling variance across 27 pools in normalized data.</p> <p>Conclusions</p> <p>We conclude that relative to var(e<sub>array</sub>), var(e<sub>construction</sub>) is of less importance in reducing the variance in allele frequency estimation from DNA pools; however, our data suggests that on average it may be more important than previously thought. We have prepared a simple online tool, PoolingPlanner (available at <url>http://www.kchew.ca/PoolingPlanner/</url>), which calculates the effective sample size (ESS) of a DNA pool given a range of replicate array values. ESS can be used in a power calculator to perform pool-adjusted calculations. This allows one to quickly calculate the loss of power associated with a pooling experiment to make an informed decision on whether a pool-based GWAS is worth pursuing.</p

    Single nucleotide polymorphism genotyping in polyploid wheat with the Illumina GoldenGate assay

    Get PDF
    Single nucleotide polymorphisms (SNPs) are indispensable in such applications as association mapping and construction of high-density genetic maps. These applications usually require genotyping of thousands of SNPs in a large number of individuals. Although a number of SNP genotyping assays are available, most of them are designed for SNP genotyping in diploid individuals. Here, we demonstrate that the Illumina GoldenGate assay could be used for SNP genotyping of homozygous tetraploid and hexaploid wheat lines. Genotyping reactions could be carried out directly on genomic DNA without the necessity of preliminary PCR amplification. A total of 53 tetraploid and 38 hexaploid homozygous wheat lines were genotyped at 96 SNP loci. The genotyping error rate estimated after removal of low-quality data was 0 and 1% for tetraploid and hexaploid wheat, respectively. Developed SNP genotyping assays were shown to be useful for genotyping wheat cultivars. This study demonstrated that the GoldenGate assay is a very efficient tool for high-throughput genotyping of polyploid wheat, opening new possibilities for the analysis of genetic variation in wheat and dissection of genetic basis of complex traits using association mapping approach

    Complex nature of SNP genotype effects on gene expression in primary human leucocytes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome wide association studies have been hugely successful in identifying disease risk variants, yet most variants do not lead to coding changes and how variants influence biological function is usually unknown.</p> <p>Methods</p> <p>We correlated gene expression and genetic variation in untouched primary leucocytes (n = 110) from individuals with celiac disease – a common condition with multiple risk variants identified. We compared our observations with an EBV-transformed HapMap B cell line dataset (n = 90), and performed a meta-analysis to increase power to detect non-tissue specific effects.</p> <p>Results</p> <p>In celiac peripheral blood, 2,315 SNP variants influenced gene expression at 765 different transcripts (< 250 kb from SNP, at FDR = 0.05, <it>cis </it>expression quantitative trait loci, eQTLs). 135 of the detected SNP-probe effects (reflecting 51 unique probes) were also detected in a HapMap B cell line published dataset, all with effects in the same allelic direction. Overall gene expression differences within the two datasets predominantly explain the limited overlap in observed <it>cis</it>-eQTLs. Celiac associated risk variants from two regions, containing genes <it>IL18RAP </it>and <it>CCR3</it>, showed significant <it>cis </it>genotype-expression correlations in the peripheral blood but not in the B cell line datasets. We identified 14 genes where a SNP affected the expression of different probes within the same gene, but in opposite allelic directions. By incorporating genetic variation in co-expression analyses, functional relationships between genes can be more significantly detected.</p> <p>Conclusion</p> <p>In conclusion, the complex nature of genotypic effects in human populations makes the use of a relevant tissue, large datasets, and analysis of different exons essential to enable the identification of the function for many genetic risk variants in common diseases.</p

    Differential Analysis of Ovarian and Endometrial Cancers Identifies a Methylator Phenotype

    Get PDF
    Despite improved outcomes in the past 30 years, less than half of all women diagnosed with epithelial ovarian cancer live five years beyond their diagnosis. Although typically treated as a single disease, epithelial ovarian cancer includes several distinct histological subtypes, such as papillary serous and endometrioid carcinomas. To address whether the morphological differences seen in these carcinomas represent distinct characteristics at the molecular level we analyzed DNA methylation patterns in 11 papillary serous tumors, 9 endometrioid ovarian tumors, 4 normal fallopian tube samples and 6 normal endometrial tissues, plus 8 normal fallopian tube and 4 serous samples from TCGA. For comparison within the endometrioid subtype we added 6 primary uterine endometrioid tumors and 5 endometrioid metastases from uterus to ovary. Data was obtained from 27,578 CpG dinucleotides occurring in or near promoter regions of 14,495 genes. We identified 36 locations with significant increases or decreases in methylation in comparisons of serous tumors and normal fallopian tube samples. Moreover, unsupervised clustering techniques applied to all samples showed three major profiles comprising mostly normal samples, serous tumors, and endometrioid tumors including ovarian, uterine and metastatic origins. The clustering analysis identified 60 differentially methylated sites between the serous group and the normal group. An unrelated set of 25 serous tumors validated the reproducibility of the methylation patterns. In contrast, >1,000 genes were differentially methylated between endometrioid tumors and normal samples. This finding is consistent with a generalized regulatory disruption caused by a methylator phenotype. Through DNA methylation analyses we have identified genes with known roles in ovarian carcinoma etiology, whereas pathway analyses provided biological insight to the role of novel genes. Our finding of differences between serous and endometrioid ovarian tumors indicates that intervention strategies could be developed to specifically address subtypes of epithelial ovarian cancer

    Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma

    Full text link
    Asthma is caused by a combination of poorly understood genetic and environmental factors(1,2). We have systematically mapped the effects of single nucleotide polymorphisms ( SNPs) on the presence of childhood onset asthma by genome-wide association. We characterized more than 317,000 SNPs in DNA from 994 patients with childhood onset asthma and 1,243 non-asthmatics, using family and case-referent panels. Here we show multiple markers on chromosome 17q21 to be strongly and reproducibly associated with childhood onset asthma in family and case-referent panels with a combined P value of P < 10(-12). In independent replication studies the 17q21 locus showed strong association with diagnosis of childhood asthma in 2,320 subjects from a cohort of German children (P=0.0003) and in 3,301 subjects from the British 1958 Birth Cohort (P=0.0005). We systematically evaluated the relationships between markers of the 17q21 locus and transcript levels of genes in Epstein - Barr virus (EBV)-transformed lymphoblastoid cell lines from children in the asthma family panel used in our association study. The SNPs associated with childhood asthma were consistently and strongly associated (P < 10(-22)) in cis with transcript levels of ORMDL3, a member of a gene family that encodes transmembrane proteins anchored in the endoplasmic reticulum(3). The results indicate that genetic variants regulating ORMDL3 expression are determinants of susceptibility to childhood asthma.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/62682/1/nature06014.pd

    Effect of population structure corrections on the results of association mapping tests in complex maize diversity panels

    Get PDF
    Association mapping of sequence polymorphisms underlying the phenotypic variability of quantitative agronomical traits is now a widely used method in plant genetics. However, due to the common presence of a complex genetic structure within the plant diversity panels, spurious associations are expected to be highly frequent. Several methods have thus been suggested to control for panel structure. They mainly rely on ad hoc criteria for selecting the number of ancestral groups; which is often not evident for the complex panels that are commonly used in maize. It was thus necessary to evaluate the effect of the selected structure models on the association mapping results. A real maize data set (342 maize inbred lines and 12,000 SNPs) was used for this study. The panel structure was estimated using both Bayesian and dimensional reduction methods, considering an increasing number of ancestral groups. Effect on association tests depends in particular on the number of ancestral groups and on the trait analyzed. The results also show that using a high number of ancestral groups leads to an over-corrected model in which all causal loci vanish. Finally the results of all models tested were combined in a meta-analysis approach. In this way, robust associations were highlighted for each analyzed trait
    corecore