47 research outputs found

    Differential Gene Expression and Epiregulation of Alpha Zein Gene Copies in Maize Haplotypes

    Get PDF
    Multigenic traits are very common in plants and cause diversity. Nutritional quality is such a trait, and one of its factors is the composition and relative expression of storage protein genes. In maize, they represent a medium-size gene family distributed over several chromosomes and unlinked locations. Two inbreds, B73 and BSSS53, both from the Iowa Stiff Stock Synthetic collection, have been selected to analyze allelic and non-allelic variability in these regions that span between 80–500 kb of chromosomal DNA. Genes were copied to unlinked sites before and after allotetraploidization of maize, but before transposition enlarged intergenic regions in a haplotype-specific manner. Once genes are copied, expression of donor genes is reduced relative to new copies. Epigenetic regulation seems to contribute to silencing older copies, because some of them can be reactivated when endosperm is maintained as cultured cells, indicating that copy number variation might contribute to a reserve of gene copies. Bisulfite sequencing of the promoter region also shows different methylation patterns among gene clusters as well as differences between tissues, suggesting a possible position effect on regulatory mechanisms as a result of inserting copies at unlinked locations. The observations offer a potential paradigm for how different gene families evolve and the impact this has on their expression and regulation of their members

    A verified genomic reference sample for assessing performance of cancer panels detecting small variants of low allele frequency

    Get PDF
    BackgroundOncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance.ResultsIn reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5-100x more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels.ConclusionThese new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.Peer reviewe

    Assessing sources of inconsistencies in genotypes and their effects on genome-wide association studies with HapMap samples

    Get PDF
    The discordance in results of independent genome-wide association studies (GWAS) indicates the potential for Type I and Type II errors. We assessed the repeatibility of current Affymetrix technologies that support GWAS. Reasonable reproducibility was observed for both raw intensity and the genotypes/copy number variants. We also assessed consistencies between different SNP arrays and between genotype calling algorithms. We observed that the inconsistency in genotypes was generally small at the specimen level. To further examine whether the differences from genotyping and genotype calling are possible sources of variation in GWAS results, an association analysis was applied to compare the associated SNPs. We observed that the inconsistency in genotypes not only propagated to the association analysis, but was amplified in the associated SNPs. Our studies show that inconsistencies between SNP arrays and between genotype calling algorithms are potential sources for the lack of reproducibility in GWAS results

    An interactive effect of batch size and composition contributes to discordant results in GWAS with the CHIAMO genotyping algorithm.

    No full text
    The discordance in results between independent genome-wide association studies (GWAS) indicates the potential for Type I and Type II errors. To identify the causes of variability underlying lack of reproducibility, here we present the results of a repeatability experiment on GWAS on a cohort of 1991 coronary artery disease individuals and 1500 controls (National Blood Service) provided by the Wellcome Trust Case Control Consortium. As part of the MicroArray Quality Control project, we identified quality control (QC) and association analysis steps with a major impact on the identification of candidate markers for possible classifiers. Different experimental conditions were used with the CHIAMO calling algorithm to assess the effects of batch size and case-control composition on downstream association analysis. Results showed that both composition and size create discordant single-nucleotide polymorphism (SNP) results for QC and statistical analysis and may contribute to the lack of reproducibility in GWAS. An interactive effect of batch size and composition contributes to discordant results in significantly associated loci. About 800 significant SNPs (Cochran-Armitage trend test, P<5.0*10^-7) were found for batches of 2000 samples with separated cases and controls, whereas only 14 significant markers were found with one batch of all samples

    Batch Effects in the BRLMM Genotype Calling Algorithm Influence GWAS Results for the Affymetrix 500K Array.

    No full text
    The Affymetrix GeneChip Human Mapping 500K array is common for genome-wide association studies (GWASs). Recent findings highlight the importance of accurate genotype calling algorithms to reduce the inflation in Type I and Type II error rates. Differential results due to genotype calling errors can introduce severe bias in case-control association study results. Using data from the Wellcome Trust Case Control Consortium, 1991 individuals with coronary artery disease (CAD) and 1500 controls from the UK Blood Services (NBS) were genotyped on the Affymetrix 500K array. Different batch sizes and compositions were used in the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM) genotype calling algorithm to assess the batch effect on downstream association analysis. Results show that composition (cases and controls genotyped simultaneously or separate) and size (number of individuals processed by BRLMM at a time) can create 2-3% discordance in the results for quality control and statistical analysis and may contribute to the lack of reproducibility between GWASs. The changes in batch size are largely responsible for differential single- nucleotide polymorphism results, yet we observe evidence of an interactive effect of batch size and composition that contributes to discordant results in the list of significantly associated loci

    Variability in GWAS Analysis: the Impact of Genotype Calling Algorithm Inconsistencies.

    No full text
    The Genome-Wide Association Working Group (GWAWG) is part of a large-scale effort by the MicroArray Quality Consortium (MAQC) to assess the quality of genomic experiments, technologies and analyses for genome-wide association studies (GWASs). One of the aims of the working group is to assess the variability of genotype calls within and between different genotype calling algorithms using data for coronary artery disease from the Wellcome Trust Case Control Consortium (WTCCC) and the University of Ottawa Heart Institute. Our results show that the choice of genotyping algorithm (for example, Bayesian robust linear model with Mahalanobis distance classifier (BRLMM), the corrected robust linear model with maximum-likelihood-based distances (CRLMM) and CHIAMO (developed and implemented by the WTCCC)) can introduce marked variability in the results of downstream case-control association analysis for the Affymetrix 500K array. The amount of discordance between results is influenced by how samples are combined and processed through the respective genotype calling algorithm, indicating that systematic genotype errors due to computational batch effects are propagated to the list of single-nucleotide polymorphisms found to be significantly associated with the trait of interest. Further work using HapMap samples shows that inconsistencies between Affymetrix arrays and calling algorithms can lead to genotyping errors that influence downstream analysis

    Assessment of variability in GWAS with CRLMM genotyping algorithm on WTCCC Coronary Artery Disease.

    No full text
    The robustness of genome-wide association study (GWAS) results depends on the genotyping algorithms used to establish the association. This paper initiated the assessment of the impact of the Corrected Robust Linear Model with Maximum Likelihood Classification (CRLMM) genotyping quality on identifying real significant genes in a GWAS with large sample sizes. With microarray image data from the Wellcome Trust Case-Control Consortium (WTCCC), 1991 individuals with coronary artery disease (CAD) and 1500 controls, genetic associations were evaluated under various batch sizes and compositions. Experimental designs included different batch sizes of 250, 350, 500, 2000 samples with different distributions of cases and controls in each batch with either randomized or simply combined (4:3 case-control ratios) or separate case-control samples as well as whole 3491 samples. The separate composition could create 2-3% discordance in the single nucleotide polymorphism (SNP) results for quality control/statistical analysis and might contribute to the lack of reproducibility between GWAS. CRLMM shows high genotyping accuracy and stability to batch effects. According to the genotypic and allelic tests (P<5.0*10^-7), nine significant signals on chromosome 9 were found consistently in all batch sizes with combined design. Our findings are critical to optimize the reproducibility of GWAS and confirm the genetic role in the pathophysiology of CAD
    corecore