272 research outputs found
Recommended from our members
A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes
Predicting phenotypes from genotypes is a fundamental task in quantitative genetics. With technological advances, it is now possible to measure multiple phenotypes in large samples. Multiple phenotypes can share their genetic component; therefore, modeling these phenotypes jointly may improve prediction accuracy by leveraging effects that are shared across phenotypes. However, effects can be shared across phenotypes in a variety of ways, so computationally efficient statistical methods are needed that can accurately and flexibly capture patterns of effect sharing. Here, we describe new Bayesian multivariate, multiple regression methods that, by using flexible priors, are able to model and adapt to different patterns of effect sharing and specificity across phenotypes. Simulation results show that these new methods are fast and improve prediction accuracy compared with existing methods in a wide range of settings where effects are shared. Further, in settings where effects are not shared, our methods still perform competitively with state-of-the-art methods. In real data analyses of expression data in the Genotype Tissue Expression (GTEx) project, our methods improve prediction performance on average for all tissues, with the greatest gains in tissues where effects are strongly shared, and in the tissues with smaller sample sizes. While we use gene expression prediction to illustrate our methods, the methods are generally applicable to any multi-phenotype applications, including prediction of polygenic scores and breeding values. Thus, our methods have the potential to provide improvements across fields and organisms
Bayesian multi-trait analysis reveals a useful tool to increase oil concentration and to decrease toxicity in Jatropha curcas L.
DOI: 10.1371/journal.pone.01503
Genetic Architecture of Complex Traits and Accuracy of Genomic Selection in Dairy Cattle
Genomic selection has emerged as an effective approach in dairy cattle breeding, in which the key is prediction of genetic merit using dense SNP genotypes, i.e., genomic prediction. To improve the accuracy of genomic prediction, we need better understanding of the genetic architecture of complex traits and more sophisticated statistical modeling. In this dissertation, I developed several computing tools and performed a series of studies to investigate the genetic architecture of complex traits in dairy cattle and to improve genomic prediction models. First, we dissected additive, dominance, and imprinting effects for production, reproduction and health traits in dairy cattle. We found that non-additive effects contributed a non-negligible amount (more for reproduction traits) to the total genetic variance of complex traits in cattle. We also identified a dominant quantitative trait locus (QTL) for milk yield, revealing that detection of QTLs with non-additive effect is possible in genome-wide association studies (GWAS) using a large dataset. Second, we developed a powerful Bayesian method and a fast software tool (BFMAP) for SNP-set association and fine-mapping. We demonstrated that BFMAP achieves a power similar to or higher than existing software tools but is at least a few times faster for association tests. We also showed that BFMAP performs well for fine-mapping and can efficiently integrate fine-mapping with functional enrichment analysis. Third, we performed large-scale GWAS and fine-mapped 35 production, reproduction, and body conformation traits to single-gene resolution. We identified many novel association signals and many promising candidate genes. We also characterized causal effect enrichment patterns for a few functional annotations in dairy cattle genome and showed that our fine-mapping result can be readily used for future functional studies. Fourth, we developed an efficient Bayesian method and a fast computing tool (SSGP) for using functional annotations in genomic prediction. We demonstrated that the method and software have great potential to increase accuracy in genomic prediction and the capability to handle very large data. Collectively, these studies advance our understanding of the genetic architecture of complex traits in dairy cattle and provide fast computing tools for analyzing complex traits and improving genomic prediction
Accuracy of Genomic Prediction for Foliar Terpene Traits in Eucalyptus polybractea
Unlike agricultural crops, most forest species have not had millennia of improvement through phenotypic selection, but can contribute energy and material resources and possibly help alleviate climate change. Yield gains similar to those achieved in agricultural crops over millennia could be made in forestry species with the use of genomic methods in a much shorter time frame. Here we compare various methods of genomic prediction for eight traits related to foliar terpene yield in Eucalyptus polybractea, a tree grown predominantly for the production of Eucalyptus oil. The genomic markers used in this study are derived from shallow whole genome sequencing of a population of 480 trees. We compare the traditional pedigree-based additive best linear unbiased predictors (ABLUP), genomic BLUP (GBLUP), BayesB genomic prediction model, and a form of GBLUP based on weighting markers according to their influence on traits (BLUP|GA). Predictive ability is assessed under varying marker densities of 10,000, 100,000 and 500,000 SNPs. Our results show that BayesB and BLUP|GA perform best across the eight traits. Predictive ability was higher for individual terpene traits, such as foliar α-pinene and 1,8-cineole concentration (0.59 and 0.73, respectively), than aggregate traits such as total foliar oil concentration (0.38). This is likely a function of the trait architecture and markers used. BLUP|GA was the best model for the two biomass related traits, height and 1 year change in height (0.25 and 0.19, respectively). Predictive ability increased with marker density for most traits, but with diminishing returns. The results of this study are a solid foundation for yield improvement of essential oil producing eucalypts. New markets such as biopolymers and terpene-derived biofuels could benefit from rapid yield increases in undomesticated oil-producing species.Funding for this project was provided by the Australian Research
Council Linkage Program (LP110100184) toWJF, the Rural Industries
Research and Development Corporation (RIRDC), Australia. Support
was also provided by the Center for BioEnergy Innovation (CBI), a
U.S DOE Bioenergy Research Center supported by the DOE office of
science
Statistical perspectives on dependencies between genomic markers
To study the genetic impact on a quantitative trait, molecular markers are used as predictor variables in a statistical model. This habilitation thesis elucidated challenges accompanied with such investigations. First, the usefulness of including different kinds of genetic effects, which can be additive or non-additive, was verified. Second, dependencies between markers caused by their proximity on the genome were studied in populations with family stratification. The resulting covariance matrix deserved special attention due to its multi-functionality in several fields of genomic evaluations
Genome-Wide Association Study for Maize Leaf Cuticular Conductance Identifies Candidate Genes Involved in the Regulation of Cuticle Development.
The cuticle, a hydrophobic layer of cutin and waxes synthesized by plant epidermal cells, is the major barrier to water loss when stomata are closed at night and under water-limited conditions. Elucidating the genetic architecture of natural variation for leaf cuticular conductance (g c) is important for identifying genes relevant to improving crop productivity in drought-prone environments. To this end, we conducted a genome-wide association study of g c of adult leaves in a maize inbred association panel that was evaluated in four environments (Maricopa, AZ, and San Diego, CA, in 2016 and 2017). Five genomic regions significantly associated with g c were resolved to seven plausible candidate genes (ISTL1, two SEC14 homologs, cyclase-associated protein, a CER7 homolog, GDSL lipase, and β-D-XYLOSIDASE 4). These candidates are potentially involved in cuticle biosynthesis, trafficking and deposition of cuticle lipids, cutin polymerization, and cell wall modification. Laser microdissection RNA sequencing revealed that all these candidate genes, with the exception of the CER7 homolog, were expressed in the zone of the expanding adult maize leaf where cuticle maturation occurs. With direct application to genetic improvement, moderately high average predictive abilities were observed for whole-genome prediction of g c in locations (0.46 and 0.45) and across all environments (0.52). The findings of this study provide novel insights into the genetic control of g c and have the potential to help breeders more effectively develop drought-tolerant maize for target environments
Multi-trait multi-environment models in the genetic selection of segregating soybean progeny.
At present, single-trait best linear unbiased prediction (BLUP) is the standard method for genetic selection in soybean. However, when genetic selection is performed based on two or more genetically correlated traits and these are analyzed individually, selection bias may arise. Under these conditions, considering the correlation structure between the evaluated traits may provide more-accurate genetic estimates for the evaluated parameters, even under environmental influences. The present study was thus developed to examine the efficiency and applicability of multi-trait multi-environment (MTME) models by the residual maximum likelihood (REML/BLUP) and Bayesian approaches in the genetic selection of segregating soybean progeny. The study involved data pertaining to 203 soybean F2:4 progeny assessed in two environments for the following traits: number of days to maturity (DM), 100-seed weight (SW), and average seed yield per plot (SY). Variance components and genetic and non-genetic parameters were estimated via the REML/BLUP and Bayesian methods. The variance components estimated and the breeding values and genetic gains predicted with selection through the Bayesian procedure were similar to those obtained by REML/BLUP. The frequentist and Bayesian MTME models provided higher estimates of broad-sense heritability per plot (or heritability of total effects of progeny; h2 prog) and mean accuracy of progeny than their respective single-trait versions. Bayesian analysis provided the credibility intervals for the estimates of h2 prog. Therefore, MTME led to greater predicted gains from selection. On this basis, this procedure can be efficiently applied in the genetic selection of segregating soybean progeny
- …