147 research outputs found

    A Hierarchical Bayesian Approach to Multi-Trait Clinical Quantitative Trait Locus Modeling

    Get PDF
    Recent advances in high-throughput genotyping and transcript profiling technologies have enabled the inexpensive production of genome-wide dense marker maps in tandem with huge amounts of expression profiles. These large-scale data encompass valuable information about the genetic architecture of important phenotypic traits. Comprehensive models that combine molecular markers and gene transcript levels are increasingly advocated as an effective approach to dissecting the genetic architecture of complex phenotypic traits. The simultaneous utilization of marker and gene expression data to explain the variation in clinical quantitative trait, known as clinical quantitative trait locus (cQTL) mapping, poses challenges that are both conceptual and computational. Nonetheless, the hierarchical Bayesian (HB) modeling approach, in combination with modern computational tools such as Markov chain Monte Carlo (MCMC) simulation techniques, provides much versatility for cQTL analysis. Sillanpää and Noykova (2008) developed a HB model for single-trait cQTL analysis in inbred line cross-data using molecular markers, gene expressions, and marker-gene expression pairs. However, clinical traits generally relate to one another through environmental correlations and/or pleiotropy. A multi-trait approach can improve on the power to detect genetic effects and on their estimation precision. A multi-trait model also provides a framework for examining a number of biologically interesting hypotheses. In this paper we extend the HB cQTL model for inbred line crosses proposed by Sillanpää and Noykova to a multi-trait setting. We illustrate the implementation of our new model with simulated data, and evaluate the multi-trait model performance with regard to its single-trait counterpart. The data simulation process was based on the multi-trait cQTL model, assuming three traits with uncorrelated and correlated cQTL residuals, with the simulated data under uncorrelated cQTL residuals serving as our test set for comparing the performances of the multi-trait and single-trait models. The simulated data under correlated cQTL residuals were essentially used to assess how well our new model can estimate the cQTL residual covariance structure. The model fitting to the data was carried out by MCMC simulation through OpenBUGS. The multi-trait model outperformed its single-trait counterpart in identifying cQTLs, with a consistently lower false discovery rate. Moreover, the covariance matrix of cQTL residuals was typically estimated to an appreciable degree of precision under the multi-trait cQTL model, making our new model a promising approach to addressing a wide range of issues facing the analysis of correlated clinical traits

    Device Detection and Channel Estimation in MTC with Correlated Activity Pattern

    Full text link
    This paper provides a solution for the activity detection and channel estimation problem in grant-free access with correlated device activity patterns. In particular, we consider a machine-type communications (MTC) network operating in event-triggered traffic mode, where the devices are distributed over clusters with an activity behaviour that exhibits both intra-cluster and inner-cluster sparsity patterns. Furthermore, to model the network's intra-cluster and inner-cluster sparsity, we propose a structured sparsity-inducing spike-and-slab prior which provides a flexible approach to encode the prior information about the correlated sparse activity pattern. Furthermore, we drive a Bayesian inference scheme based on the expectation propagation (EP) framework to solve the JUICE problem. Numerical results highlight the significant gains obtained by the proposed structured sparsity-inducing spike-and-slab prior in terms of both user identification accuracy and channel estimation performance.Comment: This is the extended abstract for the paper accepted for presentation at Asilomar 202

    Model guided trait-specific co-expression network estimation as a new perspective for identifying molecular interactions and pathways

    Get PDF
    Author summary Here we built up a mathematically justified bridge between 1) parametric approaches and 2) co-expression networks in light of identifying molecular interactions underlying complex traits. We first shared our concern that methodological improvements around these schemes, adjusting only their power and scalability, are bounded by more fundamental scheme-specific limitations. Subsequently, our theoretical results were exploited to overcome these limitations to find gene-by-gene interactions neither of which can capture alone. We also aimed to illustrate how this framework enables the interpretation of co-expression networks in a more parametric sense to achieve systematic insights into complex biological processes more reliably. The main procedure was fit for various types of biological applications and high-dimensional data to cover the area of systems biology as broadly as possible. In particular, we chose to illustrate the method's applicability for gene-profile based risk-stratification in cancer research using public acute myeloid leukemia datasets. A wide variety of 1) parametric regression models and 2) co-expression networks have been developed for finding gene-by-gene interactions underlying complex traits from expression data. While both methodological schemes have their own well-known benefits, little is known about their synergistic potential. Our study introduces their methodological fusion that cross-exploits the strengths of individual approaches via a built-in information-sharing mechanism. This fusion is theoretically based on certain trait-conditioned dependency patterns between two genes depending on their role in the underlying parametric model. Resulting trait-specific co-expression network estimation method 1) serves to enhance the interpretation of biological networks in a parametric sense, and 2) exploits the underlying parametric model itself in the estimation process. To also account for the substantial amount of intrinsic noise and collinearities, often entailed by expression data, a tailored co-expression measure is introduced along with this framework to alleviate related computational problems. A remarkable advance over the reference methods in simulated scenarios substantiate the method's high-efficiency. As proof-of-concept, this synergistic approach is successfully applied in survival analysis, with acute myeloid leukemia data, further highlighting the framework's versatility and broad practical relevance.Peer reviewe

    Geenikartoitusmenetelmien kehitystyötä

    Get PDF

    Estimating genealogies from linked marker data: a Bayesian approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Answers to several fundamental questions in statistical genetics would ideally require knowledge of the ancestral pedigree and of the gene flow therein. A few examples of such questions are haplotype estimation, relatedness and relationship estimation, gene mapping by combining pedigree and linkage disequilibrium information, and estimation of population structure.</p> <p>Results</p> <p>We present a probabilistic method for genealogy reconstruction. Starting with a group of genotyped individuals from some population isolate, we explore the state space of their possible ancestral histories under our Bayesian model by using Markov chain Monte Carlo (MCMC) sampling techniques. The main contribution of our work is the development of sampling algorithms in the resulting vast state space with highly dependent variables. The main drawback is the computational complexity that limits the time horizon within which explicit reconstructions can be carried out in practice.</p> <p>Conclusion</p> <p>The estimates for IBD (identity-by-descent) and haplotype distributions are tested in several settings using simulated data. The results appear to be promising for a further development of the method.</p

    Modelling old-age retirement : An adaptive multi-outcome LAD-lasso regression approach

    Get PDF
    Using unique administrative register data, we investigate old-age retirement under the statutory pension scheme in Finland. The analysis is based on multi-outcome modelling of pensions and working lives together with a range of explanatory variables. An adaptive multi-outcome LAD-lasso regression method is applied to obtain estimates of earnings and socioeconomic factors affecting old-age retirement and to decide which of these variables should be included in our model. The proposed statistical technique produces robust and less biased regression coefficient estimates in the context of skewed outcome distributions and an excess number of zeros in some of the explanatory variables. The results underline the importance of late life course earnings and employment to the final amount of pension and reveal differences in pension outcomes across socioeconomic groups. We conclude that adaptive LAD-lasso regression is a promising statistical technique that could be usefully employed in studying various topics in the pension industry.Peer reviewe

    Age-dependent genetic architecture across ontogeny of body size in sticklebacks

    Get PDF
    Heritable variation in traits under natural selection is a prerequisite for evolutionary response. While it is recognized that trait heritability may vary spatially and temporally depending on which environmental conditions traits are expressed under, less is known about the possibility that genetic variance contributing to the expected selection response in a given trait may vary at different stages of ontogeny. Specifically, whether different loci underlie the expression of a trait throughout development and thus providing an additional source of variation for selection to act on in the wild, is unclear. Here we show that body size, an important life-history trait, is heritable throughout ontogeny in the nine-spined stickleback (Pungitius pungitius). Nevertheless, both analyses of quantitative trait loci and genetic correlations across ages show that different chromosomes/loci contribute to this heritability in different ontogenic time-points. This suggests that body size can respond to selection at different stages of ontogeny but that this response is determined by different loci at different points of development. Hence, our study provides important results regarding our understanding of the genetics of ontogeny and opens an interesting avenue of research for studying age-specific genetic architecture as a source of non-parallel evolution.Peer reviewe

    A Gaussian process model and Bayesian variable selection for mapping function-valued quantitative traits with incomplete phenotypic data

    Get PDF
    Motivation: Recent advances in high dimensional phenotyping bring time as an extra dimension into the phenotypes. This promotes the quantitative trait locus (QTL) studies of function-valued traits such as those related to growth and development. Existing approaches for analyzing functional traits utilize either parametric methods or semi-parametric approaches based on splines and wavelets. However, very limited choices of software tools are currently available for practical implementation of functional QTL mapping and variable selection. Results: We propose a Bayesian Gaussian process (GP) approach for functional QTL mapping. We use GPs to model the continuously varying coefficients which describe how the effects of molecular markers on the quantitative trait are changing over time. We use an efficient gradient based algorithm to estimate the tuning parameters of GPs. Notably, the GP approach is directly applicable to the incomplete datasets having even larger than 50% missing data rate (among phenotypes). We further develop a stepwise algorithm to search through the model space in terms of genetic variants, and use a minimal increase of Bayesian posterior probability as a stopping rule to focus on only a small set of putative QTL. We also discuss the connection between GP and penalized B-splines and wavelets. On two simulated and three real datasets, our GP approach demonstrates great flexibility for modeling different types of phenotypic trajectories with low computational cost. The proposed model selection approach finds the most likely QTL reliably in tested datasets.Peer reviewe
    corecore