51 research outputs found

    Methods for evaluating gene expression from Affymetrix microarray datasets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Affymetrix high density oligonucleotide expression arrays are widely used across all fields of biological research for measuring genome-wide gene expression. An important step in processing oligonucleotide microarray data is to produce a single value for the gene expression level of an RNA transcript using one of a growing number of statistical methods. The challenge for the researcher is to decide on the most appropriate method to use to address a specific biological question with a given dataset. Although several research efforts have focused on assessing performance of a few methods in evaluating gene expression from RNA hybridization experiments with different datasets, the relative merits of the methods currently available in the literature for evaluating genome-wide gene expression from Affymetrix microarray data collected from real biological experiments remain actively debated.</p> <p>Results</p> <p>The present study reports a comprehensive survey of the performance of all seven commonly used methods in evaluating genome-wide gene expression from a well-designed experiment using Affymetrix microarrays. The experiment profiled eight genetically divergent barley cultivars each with three biological replicates. The dataset so obtained confers a balanced and idealized structure for the present analysis. The methods were evaluated on their sensitivity for detecting differentially expressed genes, reproducibility of expression values across replicates, and consistency in calling differentially expressed genes. The number of genes detected as differentially expressed among methods differed by a factor of two or more at a given false discovery rate (FDR) level. Moreover, we propose the use of genes containing single feature polymorphisms (SFPs) as an empirical test for comparison among methods for the ability to detect true differential gene expression on the basis that SFPs largely correspond to <it>cis</it>-acting expression regulators. The PDNN method demonstrated superiority over all other methods in every comparison, whilst the default Affymetrix MAS5.0 method was clearly inferior.</p> <p>Conclusion</p> <p>A comprehensive assessment of seven commonly used data extraction methods based on an extensive barley Affymetrix gene expression dataset has shown that the PDNN method has superior performance for the detection of differentially expressed genes.</p

    A highly robust and optimized sequence-based approach for genetic polymorphism discovery and genotyping in large plant populations

    Get PDF
    KEY MESSAGE: This optimized approach provides both a computational tool and a library construction protocol, which can maximize the number of genomic sequence reads that uniformly cover a plant genome and minimize the number of sequence reads representing chloroplast DNA and rRNA genes. One can implement the developed computational tool to feasibly design their own RAD-seq experiment to achieve expected coverage of sequence variant markers for large plant populations using information of the genome sequence and ideally, though not necessarily, information of the sequence polymorphism distribution in the genome. ABSTRACT: Advent of the next generation sequencing techniques motivates recent interest in developing sequence-based identification and genotyping of genome-wide genetic variants in large populations, with RAD-seq being a typical example. Without taking proper account for the fact that chloroplast and rRNA genes may occupy up to 60 % of the resulting sequence reads, the current RAD-seq design could be very inefficient for plant and crop species. We presented here a generic computational tool to optimize RAD-seq design in any plant species and experimentally tested the optimized design by implementing it to screen for and genotype sequence variants in four plant populations of diploid and autotetraploid Arabidopsis and potato Solanum tuberosum. Sequence data from the optimized RAD-seq experiments shows that the undesirable chloroplast and rRNA contributed sequence reads can be controlled at 3–10 %. Additionally, the optimized RAD-seq method enables pre-design of the required uniformity and density in coverage of the high quality sequence polymorphic markers over the genome of interest and genotyping of large plant or crop populations at a competitive cost in comparison to other mainstream rivals in the literature. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00122-016-2736-9) contains supplementary material, which is available to authorized users

    HANDS: a tool for genome-wide discovery of subgenome-specific base-identity in polyploids

    Get PDF
    BACKGROUND: The analysis of polyploid genomes is problematic because homeologous subgenome sequences are closely related. This relatedness makes it difficult to assign individual sequences to the specific subgenome from which they are derived, and hinders the development of polyploid whole genome assemblies. RESULTS: We here present a next-generation sequencing (NGS)-based approach for assignment of subgenome-specific base-identity at sites containing homeolog-specific polymorphisms (HSPs): ‘HSP base Assignment using NGS data through Diploid Similarity’ (HANDS). We show that HANDS correctly predicts subgenome-specific base-identity at >90% of assayed HSPs in the hexaploid bread wheat (Triticum aestivum) transcriptome, thus providing a substantial increase in accuracy versus previous methods for homeolog-specific base assignment. CONCLUSION: We conclude that HANDS enables rapid and accurate genome-wide discovery of homeolog-specific base-identity, a capability having multiple applications in polyploid genomics

    Patterns of homoeologous gene expression shown by RNA sequencing in hexaploid bread wheat

    Get PDF
    BACKGROUND: Bread wheat (Triticum aestivum) has a large, complex and hexaploid genome consisting of A, B and D homoeologous chromosome sets. Therefore each wheat gene potentially exists as a trio of A, B and D homoeoloci, each of which may contribute differentially to wheat phenotypes. We describe a novel approach combining wheat cytogenetic resources (chromosome substitution ‘nullisomic-tetrasomic’ lines) with next generation deep sequencing of gene transcripts (RNA-Seq), to directly and accurately identify homoeologue-specific single nucleotide variants and quantify the relative contribution of individual homoeoloci to gene expression. RESULTS: We discover, based on a sample comprising ~5-10% of the total wheat gene content, that at least 45% of wheat genes are expressed from all three distinct homoeoloci. Most of these genes show strikingly biased expression patterns in which expression is dominated by a single homoeolocus. The remaining ~55% of wheat genes are expressed from either one or two homoeoloci only, through a combination of extensive transcriptional silencing and homoeolocus loss. CONCLUSIONS: We conclude that wheat is tending towards functional diploidy, through a variety of mechanisms causing single homoeoloci to become the predominant source of gene transcripts. This discovery has profound consequences for wheat breeding and our understanding of wheat evolution

    Robust Detection and Genotyping of Single Feature Polymorphisms from Gene Expression Data

    Get PDF
    It is well known that Affymetrix microarrays are widely used to predict genome-wide gene expression and genome-wide genetic polymorphisms from RNA and genomic DNA hybridization experiments, respectively. It has recently been proposed to integrate the two predictions by use of RNA microarray data only. Although the ability to detect single feature polymorphisms (SFPs) from RNA microarray data has many practical implications for genome study in both sequenced and unsequenced species, it raises enormous challenges for statistical modelling and analysis of microarray gene expression data for this objective. Several methods are proposed to predict SFPs from the gene expression profile. However, their performance is highly vulnerable to differential expression of genes. The SFPs thus predicted are eventually a reflection of differentially expressed genes rather than genuine sequence polymorphisms. To address the problem, we developed a novel statistical method to separate the binding affinity between a transcript and its targeting probe and the parameter measuring transcript abundance from perfect-match hybridization values of Affymetrix gene expression data. We implemented a Bayesian approach to detect SFPs and to genotype a segregating population at the detected SFPs. Based on analysis of three Affymetrix microarray datasets, we demonstrated that the present method confers a significantly improved robustness and accuracy in detecting the SFPs that carry genuine sequence polymorphisms when compared to its rivals in the literature. The method developed in this paper will provide experimental genomicists with advanced analytical tools for appropriate and efficient analysis of their microarray experiments and biostatisticians with insightful interpretation of Affymetrix microarray data

    Clinical, radiologic, pathologic, and molecular characteristics of long-term survivors of diffuse intrinsic pontine glioma (DIPG): a collaborative report from the International and European Society for Pediatric Oncology DIPG registries

    Get PDF
    Purpose Diffuse intrinsic pontine glioma (DIPG) is a brainstem malignancy with a median survival of &lt; 1 year. The International and European Society for Pediatric Oncology DIPG Registries collaborated to compare clinical, radiologic, and histomolecular characteristics between short-term survivors (STSs) and long-term survivors (LTSs). Materials and Methods Data abstracted from registry databases included patients from North America, Australia, Germany, Austria, Switzerland, the Netherlands, Italy, France, the United Kingdom, and Croatia. Results Among 1,130 pediatric and young adults with radiographically confirmed DIPG, 122 (11%) were excluded. Of the 1,008 remaining patients, 101 (10%) were LTSs (survival ≥ 2 years). Median survival time was 11 months (interquartile range, 7.5 to 16 months), and 1-, 2-, 3-, 4-, and 5-year survival rates were 42.3% (95% CI, 38.1% to 44.1%), 9.6% (95% CI, 7.8% to 11.3%), 4.3% (95% CI, 3.2% to 5.8%), 3.2% (95% CI, 2.4% to 4.6%), and 2.2% (95% CI, 1.4% to 3.4%), respectively. LTSs, compared with STSs, more commonly presented at age &lt; 3 or &gt; 10 years (11% v 3% and 33% v 23%, respectively; P &lt; .001) and with longer symptom duration ( P &lt; .001). STSs, compared with LTSs, more commonly presented with cranial nerve palsy (83% v 73%, respectively; P = .008), ring enhancement (38% v 23%, respectively; P = .007), necrosis (42% v 26%, respectively; P = .009), and extrapontine extension (92% v 86%, respectively; P = .04). LTSs more commonly received systemic therapy at diagnosis (88% v 75% for STSs; P = .005). Biopsies and autopsies were performed in 299 patients (30%) and 77 patients (10%), respectively; 181 tumors (48%) were molecularly characterized. LTSs were more likely to harbor a HIST1H3B mutation (odds ratio, 1.28; 95% CI, 1.1 to 1.5; P = .002). Conclusion We report clinical, radiologic, and molecular factors that correlate with survival in children and young adults with DIPG, which are important for risk stratification in future clinical trials

    A Robust Statistical Method for Association-Based eQTL Analysis

    Get PDF
    Background: It has been well established that theoretical kernel for recently surging genome-wide association study (GWAS) is statistical inference of linkage disequilibrium (LD) between a tested genetic marker and a putative locus affecting a disease trait. However, LD analysis is vulnerable to several confounding factors of which population stratification is the most prominent. Whilst many methods have been proposed to correct for the influence either through predicting the structure parameters or correcting inflation in the test statistic due to the stratification, these may not be feasible or may impose further statistical problems in practical implementation. Methodology: We propose here a novel statistical method to control spurious LD in GWAS from population structure by incorporating a control marker into testing for significance of genetic association of a polymorphic marker with phenotypic variation of a complex trait. The method avoids the need of structure prediction which may be infeasible or inadequate in practice and accounts properly for a varying effect of population stratification on different regions of the genome under study. Utility and statistical properties of the new method were tested through an intensive computer simulation study and an association-based genome-wide mapping of expression quantitative trait loci in genetically divergent human populations. Results/Conclusions: The analyses show that the new method confers an improved statistical power for detecting genuin

    The Beaker phenomenon and the genomic transformation of northwest Europe

    Get PDF
    From around 2750 to 2500 bc, Bell Beaker pottery became widespread across western and central Europe, before it disappeared between 2200 and 1800 bc. The forces that propelled its expansion are a matter of long-standing debate, and there is support for both cultural diffusion and migration having a role in this process. Here we present genome-wide data from 400 Neolithic, Copper Age and Bronze Age Europeans, including 226 individuals associated with Beaker-complex artefacts. We detected limited genetic affinity between Beaker-complex-associated individuals from Iberia and central Europe, and thus exclude migration as an important mechanism of spread between these two regions. However, migration had a key role in the further dissemination of the Beaker complex. We document this phenomenon most clearly in Britain, where the spread of the Beaker complex introduced high levels of steppe-related ancestry and was associated with the replacement of approximately 90% of Britain’s gene pool within a few hundred years, continuing the east-to-west expansion that had brought steppe-related ancestry into central and northern Europe over the previous centuries

    The genetic architecture of the human cerebral cortex

    Get PDF
    The cerebral cortex underlies our complex cognitive capabilities, yet little is known about the specific genetic loci that influence human cortical structure. To identify genetic variants that affect cortical structure, we conducted a genome-wide association meta-analysis of brain magnetic resonance imaging data from 51,665 individuals. We analyzed the surface area and average thickness of the whole cortex and 34 regions with known functional specializations. We identified 199 significant loci and found significant enrichment for loci influencing total surface area within regulatory elements that are active during prenatal cortical development, supporting the radial unit hypothesis. Loci that affect regional surface area cluster near genes in Wnt signaling pathways, which influence progenitor expansion and areal identity. Variation in cortical structure is genetically correlated with cognitive function, Parkinson's disease, insomnia, depression, neuroticism, and attention deficit hyperactivity disorder
    corecore