52 research outputs found

    A Hierarchical Bayesian Approach to Multi-Trait Clinical Quantitative Trait Locus Modeling

    Get PDF
    Recent advances in high-throughput genotyping and transcript profiling technologies have enabled the inexpensive production of genome-wide dense marker maps in tandem with huge amounts of expression profiles. These large-scale data encompass valuable information about the genetic architecture of important phenotypic traits. Comprehensive models that combine molecular markers and gene transcript levels are increasingly advocated as an effective approach to dissecting the genetic architecture of complex phenotypic traits. The simultaneous utilization of marker and gene expression data to explain the variation in clinical quantitative trait, known as clinical quantitative trait locus (cQTL) mapping, poses challenges that are both conceptual and computational. Nonetheless, the hierarchical Bayesian (HB) modeling approach, in combination with modern computational tools such as Markov chain Monte Carlo (MCMC) simulation techniques, provides much versatility for cQTL analysis. Sillanpää and Noykova (2008) developed a HB model for single-trait cQTL analysis in inbred line cross-data using molecular markers, gene expressions, and marker-gene expression pairs. However, clinical traits generally relate to one another through environmental correlations and/or pleiotropy. A multi-trait approach can improve on the power to detect genetic effects and on their estimation precision. A multi-trait model also provides a framework for examining a number of biologically interesting hypotheses. In this paper we extend the HB cQTL model for inbred line crosses proposed by Sillanpää and Noykova to a multi-trait setting. We illustrate the implementation of our new model with simulated data, and evaluate the multi-trait model performance with regard to its single-trait counterpart. The data simulation process was based on the multi-trait cQTL model, assuming three traits with uncorrelated and correlated cQTL residuals, with the simulated data under uncorrelated cQTL residuals serving as our test set for comparing the performances of the multi-trait and single-trait models. The simulated data under correlated cQTL residuals were essentially used to assess how well our new model can estimate the cQTL residual covariance structure. The model fitting to the data was carried out by MCMC simulation through OpenBUGS. The multi-trait model outperformed its single-trait counterpart in identifying cQTLs, with a consistently lower false discovery rate. Moreover, the covariance matrix of cQTL residuals was typically estimated to an appreciable degree of precision under the multi-trait cQTL model, making our new model a promising approach to addressing a wide range of issues facing the analysis of correlated clinical traits

    Genome-wide association study identified novel candidate loci affecting wood formation in Norway spruce

    Get PDF
    Norway spruce is a boreal forest tree species of significant ecological and economic importance. Hence there is a strong imperative to dissect the genetics underlying important wood quality traits in the species. We performed a functional genome-wide association study (GWAS) of 17 wood traits in Norway spruce using 178 101 single nucleotide polymorphisms (SNPs) generated from exome genotyping of 517 mother trees. The wood traits were defined using functional modelling of wood properties across annual growth rings. We applied a Least Absolute Shrinkage and Selection Operator (LASSO-based) association mapping method using a functional multilocus mapping approach that utilizes latent traits, with a stability selection probability method as the hypothesis testing approach to determine a significant quantitative trait locus. The analysis provided 52 significant SNPs from 39 candidate genes, including genes previously implicated in wood formation and tree growth in spruce and other species. Our study represents a multilocus GWAS for complex wood traits in Norway spruce. The results advance our understanding of the genetics influencing wood traits and identifies candidate genes for future functional studies.Peer reviewe

    Common Variant Burden Contributes to the Familial Aggregation of Migraine in 1,589 Families

    Get PDF
    Complex traits, including migraine, often aggregate in families, but the underlying genetic architecture behind this is not well understood. The aggregation could be explained by rare, penetrant variants that segregate according to Mendelian inheritance or by the sufficient polygenic accumulation of common variants, each with an individually small effect, or a combination of the two hypotheses. In 8,319 individuals across 1,589 migraine families, we calculated migraine polygenic risk scores (PRS) and found a significantly higher common variant burden in familial cases (n = 5,317, OR = 1.76, 95% CI = 1.71-1.81, p = 1.7 × 10-109) compared to population cases from the FINRISK cohort (n = 1,101, OR = 1.32, 95% CI = 1.25-1.38, p = 7.2 × 10-17). The PRS explained 1.6% of the phenotypic variance in the population cases and 3.5% in the familial cases (including 2.9% for migraine without aura, 5.5% for migraine with typical aura, and 8.2% for hemiplegic migraine). The results demonstrate a significant contribution of common polygenic variation to the familial aggregation of migraine

    MCPeSe:Monte Carlo penalty selection for graphical lasso

    No full text
    Abstract Motivation: Graphical lasso (Glasso) is a widely used tool for identifying gene regulatory networks in systems biology. However, its computational efficiency depends on the choice of regularization parameter (tuning parameter), and selecting this parameter can be highly time consuming. Although fully Bayesian implementations of Glasso alleviate this problem somewhat by specifying a priori distribution for the parameter, these approaches lack the scalability of their frequentist counterparts. Results: Here, we present a new Monte Carlo Penalty Selection method (MCPeSe), a computationally efficient approach to regularization parameter selection for Glasso. MCPeSe combines the scalability and low computational cost of the frequentist Glasso with the ability to automatically choose the regularization by Bayesian Glasso modeling. MCPeSe provides a state-of-the-art ‘tuning-free’ model selection criterion for Glasso and allows exploration of the posterior probability distribution of the tuning parameter. Availability and implementation: R source code of MCPeSe, a step by step example showing how to apply MCPeSe and a collection of scripts used to prepare the material in this article are publicly available at GitHub under GPL (https://github.com/markkukuismin/MCPeSe/). Supplementary information: Supplementary data are available at Bioinformatics online

    Extended Bayesian LASSO for Multiple Quantitative Trait Loci Mapping and Unobserved Phenotype Prediction

    No full text
    The Bayesian LASSO (BL) has been pointed out to be an effective approach to sparse model representation and successfully applied to quantitative trait loci (QTL) mapping and genomic breeding value (GBV) estimation using genome-wide dense sets of markers. However, the BL relies on a single parameter known as the regularization parameter to simultaneously control the overall model sparsity and the shrinkage of individual covariate effects. This may be idealistic when dealing with a large number of predictors whose effect sizes may differ by orders of magnitude. Here we propose the extended Bayesian LASSO (EBL) for QTL mapping and unobserved phenotype prediction, which introduces an additional level to the hierarchical specification of the BL to explicitly separate out these two model features. Compared to the adaptiveness of the BL, the EBL is “doubly adaptive” and thus, more robust to tuning. In simulations, the EBL outperformed the BL in regard to the accuracy of both effect size estimates and phenotypic value predictions, with comparable computational time. Moreover, the EBL proved to be less sensitive to tuning than the related Bayesian adaptive LASSO (BAL), which introduces locus-specific regularization parameters as well, but involves no mechanism for distinguishing between model sparsity and parameter shrinkage. Consequently, the EBL seems to point to a new direction for QTL mapping, phenotype prediction, and GBV estimation

    Estimation of covariance and precision matrix, network structure, and a view toward systems biology

    No full text
    Abstract Covariance matrix and its inverse, known as the precision matrix, have many applications in multivariate analysis because their elements can exhibit the variance, correlation, covariance, and conditional independence between variables. The practice of estimating the precision matrix directly without involving any matrix inversion has obtained significant attention in the literature. We review the methods that have been implemented in R and their R packages, particularly when there are more variables than data samples and discuss ideas behind them. We describe how sparse precision matrix estimation methods can be used to infer network structure. Finally, we discuss methods that are suitable for gene coexpression network construction

    Impact of residual covariance structures on genomic prediction ability in multi-environment trials

    No full text
    Abstract In plant breeding, one of the main purpose of multi-environment trial (MET) is to assess the intensity of genotype-by-environment (G×E) interactions in order to select high-performing lines of each environment. Most models to analyze such MET data consider only the additive genetic effects and the part of the non-additive genetic effects are confounded with the residual terms and this may lead to the non-negligible residual covariances between the same trait measured at multiple environments. In breeding programs it is also common to have the phenotype information from some environments available and values are missing in some other environments. In this study we focused on two problems: (1) to study the impact of different residual covariance structures on genomic prediction ability using different models to analyze MET data; (2) to compare the ability of different MET analysis models to predict the missing values in a single environment. Our results suggests that, it is important to consider the heterogeneous residual covariance structure for the MET analysis and multivariate mixed model seems to be especially suitable to predict the missing values in a single environment. We also present the prediction abilities based on Bayesian and frequentist approaches with different models using field data sets (maize and rice) having different levels of G×E interactions

    A novel linkage-disequilibrium corrected genomic relationship matrix for SNP-heritability estimation and genomic prediction

    No full text
    Abstract Single nucleotide polymorphism (SNP)-heritability estimation is an important topic in several research fields, including animal, plant and human genetics, as well as in ecology. Linear mixed model estimation of SNP-heritability uses the structures of genomic relationships between individuals, which is constructed from genome-wide sets of SNP-markers that are generally weighted equally in their contributions. Proposed methods to handle dependence between SNPs include, “thinning” the marker set by linkage disequilibrium (LD)-pruning, the use of haplotype-tagging of SNPs, and LD-weighting of the SNP-contributions. For improved estimation, we propose a new conceptual framework for genomic relationship matrix, in which Mahalanobis distance-based LD-correction is used in a linear mixed model estimation of SNP-heritability. The superiority of the presented method is illustrated and compared to mixed-model analyses using a VanRaden genomic relationship matrix, a matrix used by GCTA and a matrix employing LD-weighting (as implemented in the LDAK software) in simulated (using real human, rice and cattle genotypes) and real (maize, rice and mice) datasets. Despite of the computational difficulties, our results suggest that by using the proposed method one can improve the accuracy of SNP-heritability estimates in datasets with high LD

    Advances in statistical methods to handle large data sets for GWAS in crop breeding

    No full text
    Abstract One of the most important statistical methods of handling large data sets for genome-wide association mapping (GWAS) is quantitative trait loci (QTL) analysis. Two approaches to QTL analysis are linkage analysis (LA) and linkage disequilibrium (LD) mapping. Even though association and linkage mapping are viewed as fundamentally different approaches, both methods try to make use of recombination events. This chapter discusses some of the main challenges for GWAS studies with large data sets. This chapter describes both single-locus and multilocus association models, before going on to discuss high dimensional data space in GWAS, the significance threshold for association, and dimensionality reduction methods. Finally, the chapter looks ahead to future trends in this field
    corecore