508 research outputs found

    Statistical tests for differential expression in cDNA microarray experiments

    Get PDF
    Extracting biological information from microarray data requires appropriate statistical methods. The simplest statistical method for detecting differential expression is the t test, which can be used to compare two conditions when there is replication of samples. With more than two conditions, analysis of variance (ANOVA) can be used, and the mixed ANOVA model is a general and powerful approach for microarray experiments with multiple factors and/or several sources of variation

    Bayesian model selection reveals biological origins of zero inflation in single-cell transcriptomics.

    Get PDF
    BACKGROUND: Single-cell RNA sequencing is a powerful tool for characterizing cellular heterogeneity in gene expression. However, high variability and a large number of zero counts present challenges for analysis and interpretation. There is substantial controversy over the origins and proper treatment of zeros and no consensus on whether zero-inflated count distributions are necessary or even useful. While some studies assume the existence of zero inflation due to technical artifacts and attempt to impute the missing information, other recent studies argue that there is no zero inflation in scRNA-seq data. RESULTS: We apply a Bayesian model selection approach to unambiguously demonstrate zero inflation in multiple biologically realistic scRNA-seq datasets. We show that the primary causes of zero inflation are not technical but rather biological in nature. We also demonstrate that parameter estimates from the zero-inflated negative binomial distribution are an unreliable indicator of zero inflation. CONCLUSIONS: Despite the existence of zero inflation in scRNA-seq counts, we recommend the generalized linear model with negative binomial count distribution, not zero-inflated, as a suitable reference model for scRNA-seq analysis

    High-Density Genotypes of Inbred Mouse Strains: Improved Power and Precision of Association Mapping.

    Get PDF
    Human genome-wide association studies have identified thousands of loci associated with disease phenotypes. Genome-wide association studies also have become feasible using rodent models and these have some important advantages over human studies, including controlled environment, access to tissues for molecular profiling, reproducible genotypes, and a wide array of techniques for experimental validation. Association mapping with common mouse inbred strains generally requires 100 or more strains to achieve sufficient power and mapping resolution; in contrast, sample sizes for human studies typically are one or more orders of magnitude greater than this. To enable well-powered studies in mice, we have generated high-density genotypes for āˆ¼175 inbred strains of mice using the Mouse Diversity Array. These new data increase marker density by 1.9-fold, have reduced missing data rates, and provide more accurate identification of heterozygous regions compared with previous genotype data. We report the discovery of new loci from previously reported association mapping studies using the new genotype data. The data are freely available for download, and Web-based tools provide easy access for association mapping and viewing of the underlying intensity data for individual loci

    Quantitative trait mapping in Diversity Outbred mice identifies novel genomic regions associated with the hepatic glutathione redox system.

    Get PDF
    The tripeptide glutathione (GSH) is instrumental to antioxidant protection and xenobiotic metabolism, and the ratio of its reduced and oxidized forms (GSH/GSSG) indicates the cellular redox environment and maintains key aspects of cellular signaling. Disruptions in GSH levels and GSH/GSSG have long been tied to various chronic diseases, and many studies have examined whether variant alleles in genes responsible for GSH synthesis and metabolism are associated with increased disease risk. However, past studies have been limited to established, canonical GSH genes, though emerging evidence suggests that novel loci and genes influence the GSH redox system in specific tissues. The present study marks the most comprehensive effort to date to directly identify genetic loci associated with the GSH redox system. We employed the Diversity Outbred (DO) mouse population, a model of human genetics, and measured GSH and the essential redox cofactor NADPH in liver, the organ with the highest levels of GSH in the body. Under normal physiological conditions, we observed substantial variation in hepatic GSH and NADPH levels and their redox balances, and discovered a novel, significant quantitative trait locus (QTL) on murine chromosome 16 underlying GSH/GSSG; bioinformatics analyses revealed Socs1 to be the most likely candidate gene. We also discovered novel QTL associated with hepatic NAD

    Importance of randomization in microarray experimental designs with Illumina platforms

    Get PDF
    Measurements of gene expression from microarray experiments are highly dependent on experimental design. Systematic noise can be introduced into the data at numerous steps. On Illumina BeadChips, multiple samples are assayed in an ordered series of arrays. Two experiments were performed using the same samples but different hybridization designs. An experiment confounding genotype with BeadChip and treatment with array position was compared to another experiment in which these factors were randomized to BeadChip and array position. An ordinal effect of array position on intensity values was observed in both experiments. We demonstrate that there is increased rate of false-positive results in the confounded design and that attempts to correct for confounded effects by statistical modeling reduce power of detection for true differential expression. Simple analysis models without post hoc corrections provide the best results possible for a given experimental design. Normalization improved differential expression testing in both experiments but randomization was the most important factor for establishing accurate results. We conclude that lack of randomization cannot be corrected by normalization or by analytical methods. Proper randomization is essential for successful microarray experiments

    Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice.

    Get PDF
    We compared the performance of linear (GBLUP, BayesB, and elastic net) methods to a nonparametric tree-based ensemble (gradient boosting machine) method for genomic prediction of complex traits in mice. The dataset used contained genotypes for 50,112 SNP markers and phenotypes for 835 animals from 6 generations. Traits analyzed were bone mineral density, body weight at 10, 15, and 20ā€‰weeks, fat percentage, circulating cholesterol, glucose, insulin, triglycerides, and urine creatinine. The youngest generation was used as a validation subset, and predictions were based on all older generations. Model performance was evaluated by comparing predictions for animals in the validation subset against their adjusted phenotypes. Linear models outperformed gradient boosting machine for 7 out of 10 traits. For bone mineral density, cholesterol, and glucose, the gradient boosting machine model showed better prediction accuracy and lower relative root mean squared error than the linear models. Interestingly, for these 3 traits, there is evidence of a relevant portion of phenotypic variance being explained by epistatic effects. Using a subset of top markers selected from a gradient boosting machine model helped for some of the traits to improve the accuracy of prediction when these were fitted into linear and gradient boosting machine models. Our results indicate that gradient boosting machine is more strongly affected by data size and decreased connectedness between reference and validation sets than the linear models. Although the linear models outperformed gradient boosting machine for the polygenic traits, our results suggest that gradient boosting machine is a competitive method to predict complex traits with assumed epistatic effects

    Adding gene transcripts into genomic prediction improves accuracy and reveals sampling time dependence.

    Get PDF
    Recent developments allowed generating multiple high-quality \u27omics\u27 data that could increase the predictive performance of genomic prediction for phenotypes and genetic merit in animals and plants. Here, we have assessed the performance of parametric and nonparametric models that leverage transcriptomics in genomic prediction for 13 complex traits recorded in 478 animals from an outbred mouse population. Parametric models were implemented using the best linear unbiased prediction, while nonparametric models were implemented using the gradient boosting machine algorithm. We also propose a new model named GTCBLUP that aims to remove between-omics-layer covariance from predictors, whereas its counterpart GTBLUP does not do that. While gradient boosting machine models captured more phenotypic variation, their predictive performance did not exceed the best linear unbiased prediction models for most traits. Models leveraging gene transcripts captured higher proportions of the phenotypic variance for almost all traits when these were measured closer to the moment of measuring gene transcripts in the liver. In most cases, the combination of layers was not able to outperform the best single-omics models to predict phenotypes. Using only gene transcripts, the gradient boosting machine model was able to outperform best linear unbiased prediction for most traits except body weight, but the same pattern was not observed when using both single nucleotide polymorphism genotypes and gene transcripts. Although the GTCBLUP model was not able to produce the most accurate phenotypic predictions, it showed the highest accuracies for breeding values for 9 out of 13 traits. We recommend using the GTBLUP model for prediction of phenotypes and using the GTCBLUP for prediction of breeding values

    Evidence of a Large-Scale Functional Organization of Mammalian Chromosomes

    Get PDF
    Evidence from inbred strains of mice indicates that a quarter or more of the mammalian genome consists of chromosome regions containing clusters of functionally related genes. The intense selection pressures during inbreeding favor the coinheritance of optimal sets of alleles among these genetically linked, functionally related genes, resulting in extensive domains of linkage disequilibrium (LD) among a set of 60 genetically diverse inbred strains. Recombination that disrupts the preferred combinations of alleles reduces the ability of offspring to survive further inbreeding. LD is also seen between markers on separate chromosomes, forming networks with scale-free architecture. Combining LD data with pathway and genome annotation databases, we have been able to identify the biological functions underlying several domains and networks. Given the strong conservation of gene order among mammals, the domains and networks we find in mice probably characterize all mammals, including humans
    • ā€¦
    corecore