Search CORE

10,715 research outputs found

Bayesian robust analysis for genetic architecture of quantitative traits

Author: Andrews
Ball
Coppieters
Dempster
Diao
Elsen
Feenstra
Fernandez
Gelman
Hackett
Hongwen Deng
Jansen
Jian Li
Kass
Kruglyak
Lander
Lange
Lange
Little
Liu
Pinheiro
Plummer
Raftery
Rebaï
Ripley
Rogers
Rohr
Rosa
Rosa
Runqing Yang
Sillanpää
Sokal
Symons
Wang
Xin Wang
Yang
Yi
Yi
Zhang
Zou
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: In most quantitative trait locus (QTL) mapping studies, phenotypes are assumed to follow normal distributions. Deviations from this assumption may affect the accuracy of QTL detection and lead to detection of spurious QTLs. To improve the robustness of QTL mapping methods, we replaced the normal distribution for residuals in multiple interacting QTL models with the normal/independent distributions that are a class of symmetric and long-tailed distributions and are able to accommodate residual outliers. Subsequently, we developed a Bayesian robust analysis strategy for dissecting genetic architecture of quantitative traits and for mapping genome-wide interacting QTLs in line crosses

Crossref

PubMed Central

Bayesian Sparse Factor Analysis of Genetic Covariance Matrices

Author: Mukherjee Sayan
Runcie Daniel E
Publication venue: 'Genetics Society of America'
Publication date: 15/03/2013
Field of study

Quantitative genetic studies that model complex, multivariate phenotypes are important for both evolutionary prediction and artificial selection. For example, changes in gene expression can provide insight into developmental and physiological mechanisms that link genotype and phenotype. However, classical analytical techniques are poorly suited to quantitative genetic studies of gene expression where the number of traits assayed per individual can reach many thousand. Here, we derive a Bayesian genetic sparse factor model for estimating the genetic covariance matrix (G-matrix) of high-dimensional traits, such as gene expression, in a mixed effects model. The key idea of our model is that we need only consider G-matrices that are biologically plausible. An organism's entire phenotype is the result of processes that are modular and have limited complexity. This implies that the G-matrix will be highly structured. In particular, we assume that a limited number of intermediate traits (or factors, e.g., variations in development or physiology) control the variation in the high-dimensional phenotype, and that each of these intermediate traits is sparse -- affecting only a few observed traits. The advantages of this approach are two-fold. First, sparse factors are interpretable and provide biological insight into mechanisms underlying the genetic architecture. Second, enforcing sparsity helps prevent sampling errors from swamping out the true signal in high-dimensional data. We demonstrate the advantages of our model on simulated data and in an analysis of a published Drosophila melanogaster gene expression data set.Comment: 35 pages, 7 figure

arXiv.org e-Print Archive

CiteSeerX

Dissecting high-dimensional phenotypes with bayesian sparse factor analysis of genetic covariance matrices.

Author: Mukherjee Sayan
Runcie Daniel E
Publication venue: eScholarship, University of California
Publication date: 01/07/2013
Field of study

Quantitative genetic studies that model complex, multivariate phenotypes are important for both evolutionary prediction and artificial selection. For example, changes in gene expression can provide insight into developmental and physiological mechanisms that link genotype and phenotype. However, classical analytical techniques are poorly suited to quantitative genetic studies of gene expression where the number of traits assayed per individual can reach many thousand. Here, we derive a Bayesian genetic sparse factor model for estimating the genetic covariance matrix (G-matrix) of high-dimensional traits, such as gene expression, in a mixed-effects model. The key idea of our model is that we need consider only G-matrices that are biologically plausible. An organism's entire phenotype is the result of processes that are modular and have limited complexity. This implies that the G-matrix will be highly structured. In particular, we assume that a limited number of intermediate traits (or factors, e.g., variations in development or physiology) control the variation in the high-dimensional phenotype, and that each of these intermediate traits is sparse - affecting only a few observed traits. The advantages of this approach are twofold. First, sparse factors are interpretable and provide biological insight into mechanisms underlying the genetic architecture. Second, enforcing sparsity helps prevent sampling errors from swamping out the true signal in high-dimensional data. We demonstrate the advantages of our model on simulated data and in an analysis of a published Drosophila melanogaster gene expression data set

PubMed Central

eScholarship - University of California

Detection of regulator genes and eQTLs in gene networks

Author: A Butte
A Chatr-Aryamontri
A Clauset
A Joshi
A Joshi
A Kundaje
AA Shabalin
AJ Enright
AJ Walhout
AS Dimas
B Schwanhausser
B Zhang
B Zhang
C Cenik
CO Daub
D Koller
DA Cusanovich
DM Greenawalt
E Bonnet
E Ravasz
E Segal
EC Neto
EC Neto
EC Neto
EE Schadt
EE Schadt
EE Schadt
EE Schadt
EE Schadt
EJ Foss
F Grubert
F Yue
FA Cubillos
FW Albert
G Hemani
G Nicholson
GD Smith
GH Golub
H Foroughi Asl
H Talukdar
HN Kadarmideen
J Millstein
J Qi
J Zhu
J Zhu
J Zhu
JE Aten
JF Ayroles
JJ Faith
JL Björkegren
JS Liu
K Basso
K Qu
KG Ardlie
L Wu
LA Hindorff
LH Hartwell
LS Chen
M Ashburner
M Civelek
M Georges
M Gerstein
M Medvedovic
M Schmidt
M Scutari
MA Schaub
MB Eisen
MD Ritchie
ME Goddard
MEJ Newman
MEJ Newman
MV Rockman
MV Rockman
N Friedman
N Friedman
N Friedman
N Laird
O Stegle
P Langfelder
P Langfelder
P Langfelder
P Lu
R Sharan
R Sharan
RB Brem
RW Williams
S Lee
S Roy
S Tavazoie
SI Lee
SM Waszak
SS Rao
T Lappalainen
T Michoel
TA Manolio
TF Mackay
The ENCODE
TS Furey
VG Cheung
W Cookson
W Zhang
Y Chen
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2016
Field of study

Genetic differences between individuals associated to quantitative phenotypic traits, including disease states, are usually found in non-coding genomic regions. These genetic variants are often also associated to differences in expression levels of nearby genes (they are "expression quantitative trait loci" or eQTLs for short) and presumably play a gene regulatory role, affecting the status of molecular networks of interacting genes, proteins and metabolites. Computational systems biology approaches to reconstruct causal gene networks from large-scale omics data have therefore become essential to understand the structure of networks controlled by eQTLs together with other regulatory genes, and to generate detailed hypotheses about the molecular mechanisms that lead from genotype to phenotype. Here we review the main analytical methods and softwares to identify eQTLs and their associated genes, to reconstruct co-expression networks and modules, to reconstruct causal Bayesian gene and module networks, and to validate predicted networks in silico.Comment: minor revision with typos corrected; review article; 24 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Bayesian Approximate Kernel Regression with Variable Selection

Author: Crawford Lorin
Mukherjee Sayan
Wood Kris C.
Zhou Xiang
Publication venue
Publication date: 09/06/2017
Field of study

Nonlinear kernel regression models are often used in statistics and machine learning because they are more accurate than linear models. Variable selection for kernel regression models is a challenge partly because, unlike the linear regression setting, there is no clear concept of an effect size for regression coefficients. In this paper, we propose a novel framework that provides an effect size analog of each explanatory variable for Bayesian kernel regression models when the kernel is shift-invariant --- for example, the Gaussian kernel. We use function analytic properties of shift-invariant reproducing kernel Hilbert spaces (RKHS) to define a linear vector space that: (i) captures nonlinear structure, and (ii) can be projected onto the original explanatory variables. The projection onto the original explanatory variables serves as an analog of effect sizes. The specific function analytic property we use is that shift-invariant kernel functions can be approximated via random Fourier bases. Based on the random Fourier expansion we propose a computationally efficient class of Bayesian approximate kernel regression (BAKR) models for both nonlinear regression and binary classification for which one can compute an analog of effect sizes. We illustrate the utility of BAKR by examining two important problems in statistical genetics: genomic selection (i.e. phenotypic prediction) and association mapping (i.e. inference of significant variants or loci). State-of-the-art methods for genomic selection and association mapping are based on kernel regression and linear models, respectively. BAKR is the first method that is competitive in both settings.Comment: 22 pages, 3 figures, 3 tables; theory added; new simulations presented; references adde

arXiv.org e-Print Archive

FigShare

On Flexible finite polygenic models for multiple-trait evaluation

Author: Bink M.C.A.M.
Publication venue
Publication date: 01/01/2002
Field of study

Finite polygenic models (FPM) might be an alternative to the infinitesimal model (TIM) for the genetic evaluation of pedigreed multiple-generation populations for multiple quantitative traits. I present a general flexible Bayesian method that includes the number of genes in the FPM as an additional random variable. Markov-chain Monte Carlo techniques such as Gibbs sampling and the reversible jump sampler are used for implementation. Sampling of genotypes of all genes in the FPM is done via the use of segregation indicators. A broad range of FPM models, some combined with TIM, are empirically tested for the estimation of variance components and the number of genes in the FPM. Four simulation scenarios were studied, including genetic models with 5 or 50 additive independent diallelic genes affecting the traits, and random selection or selection on one of the traits was performed. The results in this study were based on ten replicates per simulation scenario. In the case of random selection, uniform priors on additive gene effects led to posterior mean estimates of genetic variance that were positively correlated with the number of genes fitted in the FPM. In the case of trait selection, assuming normal priors on gene effects also led to genetic variance estimates for the selected trait that were negatively correlated with the number of genes in the FPM. This negative correlation was not observed for the unselected trait. Treating the number of genes in the FPM as random revealed a positive correlation between prior and posterior mean estimates of this number, but the prior hardly affected the posterior estimates of genetic variance. Posterior inferences about the number of genes should be considered to be indicative where trait selection seems to improve the power of distinguishing between TIM and FPM. Based on the results of this study, I suggest not replacing TIM by the FPM, but combining TIM and FPM with the number of genes treated as random, to facilitate a highly flexible and thereby robust method for variance component estimation in pedigreed populations. Further study is required to explore the full potential of these models under different genetic model assumption

Wageningen University & Research Publications

Accuracy and responses of genomic selection on key traits in apple breeding

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/11/2016
Field of study

open13siThe application of genomic selection in fruit tree crops is expected to enhance breeding efficiency by increasing prediction accuracy, increasing selection intensity and decreasing generation interval. The objectives of this study were to assess the accuracy of prediction and selection response in commercial apple breeding programmes for key traits. The training population comprised 977 individuals derived from 20 pedigreed full-sib families. Historic phenotypic data were available on 10 traits related to productivity and fruit external appearance and genotypic data for 7829 SNPs obtained with an Illumina 20K SNP array. From these data, a genome-wide prediction model was built and subsequently used to calculate genomic breeding values of five application full-sib families. The application families had genotypes at 364 SNPs from a dedicated 512 SNP array, and these genotypic data were extended to the high-density level by imputation. These five families were phenotyped for 1 year and their phenotypes were compared to the predicted breeding values. Accuracy of genomic prediction across the 10 traits reached a maximum value of 0.5 and had a median value of 0.19. The accuracies were strongly affected by the phenotypic distribution and heritability of traits. In the largest family, significant selection response was observed for traits with high heritability and symmetric phenotypic distribution. Traits that showed non-significant response often had reduced and skewed phenotypic variation or low heritability. Among the five application families the accuracies were uncorrelated to the degree of relatedness to the training population. The results underline the potential of genomic prediction to accelerate breeding progress in outbred fruit tree crops that still need to overcome long generation intervals and extensive phenotyping costs.openMuranty, H.; Troggio, M.; Sadok, I.B.; Mehdi A.R.; Auwerkerken, A.; Banchi, E.; Velasco, R.; Stevanato, P.; Eric van de Weg, W.; Di Guardo, M.; Kumar, S.; Laurens, F.; Bink, M.C.A.M.Muranty, H.; Troggio, M.; Sadok, I. B.; Mehdi, A. R.; Auwerkerken, A.; Banchi, E.; Velasco, R.; Stevanato, Piergiorgio; Eric van de Weg, W.; Di Guardo, M.; Kumar, S.; Laurens, F.; Bink, M. C. A. M

Archivio istituzionale della ricerca - Università di Padova

Novel Bayesian Networks for Genomic Prediction of Developmental Traits in Biomass Sorghum.

Author: Brown Patrick J
Buckler Edward S
Dos Santos Jhonathan PR
Fernandes Samuel B
Garcia Antonio AF
Gore Michael A
Leakey Andrew DB
Lozano Roberto
McCoy Scott
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

The ability to connect genetic information between traits over time allow Bayesian networks to offer a powerful probabilistic framework to construct genomic prediction models. In this study, we phenotyped a diversity panel of 869 biomass sorghum (Sorghum bicolor (L.) Moench) lines, which had been genotyped with 100,435 SNP markers, for plant height (PH) with biweekly measurements from 30 to 120 days after planting (DAP) and for end-of-season dry biomass yield (DBY) in four environments. We evaluated five genomic prediction models: Bayesian network (BN), Pleiotropic Bayesian network (PBN), Dynamic Bayesian network (DBN), multi-trait GBLUP (MTr-GBLUP), and multi-time GBLUP (MTi-GBLUP) models. In fivefold cross-validation, prediction accuracies ranged from 0.46 (PBN) to 0.49 (MTr-GBLUP) for DBY and from 0.47 (DBN, DAP120) to 0.75 (MTi-GBLUP, DAP60) for PH. Forward-chaining cross-validation further improved prediction accuracies of the DBN, MTi-GBLUP and MTr-GBLUP models for PH (training slice: 30-45 DAP) by 36.4-52.4% relative to the BN and PBN models. Coincidence indices (target: biomass, secondary: PH) and a coincidence index based on lines (PH time series) showed that the ranking of lines by PH changed minimally after 45 DAP. These results suggest a two-level indirect selection method for PH at harvest (first-level target trait) and DBY (second-level target trait) could be conducted earlier in the season based on ranking of lines by PH at 45 DAP (secondary trait). With the advance of high-throughput phenotyping technologies, our proposed two-level indirect selection framework could be valuable for enhancing genetic gain per unit of time when selecting on developmental traits

Directory of Open Access Journals

eScholarship - University of California