13,663 research outputs found

    Evolutionary Inference for Function-valued Traits: Gaussian Process Regression on Phylogenies

    Full text link
    Biological data objects often have both of the following features: (i) they are functions rather than single numbers or vectors, and (ii) they are correlated due to phylogenetic relationships. In this paper we give a flexible statistical model for such data, by combining assumptions from phylogenetics with Gaussian processes. We describe its use as a nonparametric Bayesian prior distribution, both for prediction (placing posterior distributions on ancestral functions) and model selection (comparing rates of evolution across a phylogeny, or identifying the most likely phylogenies consistent with the observed data). Our work is integrative, extending the popular phylogenetic Brownian Motion and Ornstein-Uhlenbeck models to functional data and Bayesian inference, and extending Gaussian Process regression to phylogenies. We provide a brief illustration of the application of our method.Comment: 7 pages, 1 figur

    A Bayesian generalized random regression model for estimating heritability using overdispersed count data

    Get PDF
    Background: Faecal egg counts are a common indicator of nematode infection and since it is a heritable trait, it provides a marker for selective breeding. However, since resistance to disease changes as the adaptive immune system develops, quantifying temporal changes in heritability could help improve selective breeding programs. Faecal egg counts can be extremely skewed and difficult to handle statistically. Therefore, previous heritability analyses have log transformed faecal egg counts to estimate heritability on a latent scale. However, such transformations may not always be appropriate. In addition, analyses of faecal egg counts have typically used univariate rather than multivariate analyses such as random regression that are appropriate when traits are correlated. We present a method for estimating the heritability of untransformed faecal egg counts over the grazing season using random regression. Results: Replicating standard univariate analyses, we showed the dependence of heritability estimates on choice of transformation. Then, using a multitrait model, we exposed temporal correlations, highlighting the need for a random regression approach. Since random regression can sometimes involve the estimation of more parameters than observations or result in computationally intractable problems, we chose to investigate reduced rank random regression. Using standard software (WOMBAT), we discuss the estimation of variance components for log transformed data using both full and reduced rank analyses. Then, we modelled the untransformed data assuming it to be negative binomially distributed and used Metropolis Hastings to fit a generalized reduced rank random regression model with an additive genetic, permanent environmental and maternal effect. These three variance components explained more than 80 % of the total phenotypic variation, whereas the variance components for the log transformed data accounted for considerably less. The heritability, on a link scale, increased from around 0.25 at the beginning of the grazing season to around 0.4 at the end. Conclusions: Random regressions are a useful tool for quantifying sources of variation across time. Our MCMC (Markov chain Monte Carlo) algorithm provides a flexible approach to fitting random regression models to non-normal data. Here we applied the algorithm to negative binomially distributed faecal egg count data, but this method is readily applicable to other types of overdispersed data

    Warped Functional Analysis of Variance

    Full text link
    This article presents an Analysis of Variance model for functional data that explicitly incorporates phase variability through a time-warping component, allowing for a unified approach to estimation and inference in presence of amplitude and time variability. The focus is on single-random-factor models but the approach can be easily generalized to more complex ANOVA models. The behavior of the estimators is studied by simulation, and an application to the analysis of growth curves of flour beetles is presented. Although the model assumes a smooth latent process behind the observed trajectories, smoothness of the observed data is not required; the method can be applied to the sparsely observed data that is often encountered in longitudinal studies

    Implicit prices of indigenous cattle traits in central Ethiopia: Application of revealed and stated preference approaches

    Get PDF
    The diversity of animal genetic resources has a quasi-public good nature that makes market prices inadequate indicator of its economic worth. Applying the characteristics theory of value, this research estimated the relative economic worth of the attributes of cattle genetic resources in central Ethiopia. Transaction level data were collected over four seasons in a year and choice experiment survey was done in five markets to generate data on both revealed and stated preferences of cattle buyers. Heteroscedasticity efficient estimation and random parameters logit were employed to analyse the data. The results essentially show that attributes related to the subsistence functions of cattle are more valued than attributes that directly influence marketable products of the animals. The findings imply the strong need to invest on improvement of attributes of cattle in the study area that enhance the subsistence functions of cattle that their owners accord higher priority to support their livelihoods than they do to tradable products

    Detection and modelling of time-dependent QTL in animal populations

    Get PDF
    A longitudinal approach is proposed to map QTL affecting function-valued traits and to estimate their effect over time. The method is based on fitting mixed random regression models. The QTL allelic effects are modelled with random coefficient parametric curves and using a gametic relationship matrix. A simulation study was conducted in order to assess the ability of the approach to fit different patterns of QTL over time. It was found that this longitudinal approach was able to adequately fit the simulated variance functions and considerably improved the power of detection of time-varying QTL effects compared to the traditional univariate model. This was confirmed by an analysis of protein yield data in dairy cattle, where the model was able to detect QTL with high effect either at the beginning or the end of the lactation, that were not detected with a simple 305 day model

    Statistical models for the genetic analysis of longitudinal data

    Get PDF

    Estimation of dynamic SNP-heritability with Bayesian Gaussian process models

    Get PDF
    Motivation: Improved DNA technology has made it practical to estimate single nucleotide polymorphism (SNP)-heritability among distantly related individuals with unknown relationships. For growth and development related traits, it is meaningful to base SNP-heritability estimation on longitudinal data due to the time-dependency of the process. However, only few statistical methods have been developed so far for estimating dynamic SNP-heritability and quantifying its full uncertainty. / Results: We introduce a completely tuning-free Bayesian Gaussian process (GP) based approach for estimating dynamic variance components and heritability as their function. For parameter estimation, we use a modern Markov Chain Monte Carlo (MCMC) method which allows full uncertainty quantification. Several data sets are analysed and our results clearly illustrate that the 95 % credible intervals of the proposed joint estimation method (which "borrows strength" from adjacent time points) are significantly narrower than of a two-stage baseline method that first estimates the variance components at each time point independently and then performs smoothing. We compare the method with a random regression model using MTG2 and BLUPF90 softwares and quantitative measures indicate superior performance of our method. Results are presented for simulated and real data with up to 1000 time points. Finally, we demonstrate scalability of the proposed method for simulated data with tens of thousands of individuals. / Availability: The C++ implementation dynBGP and simulated data are available in GitHub (https://github.com/aarjas/dynBGP). The programs can be run in R. Real datasets are available in QTL archive (https://phenome.jax.org/centers/QTLA). / Supplementary information: Supplementary data are available at Bioinformatics online

    Bayesian Sparse Factor Analysis of Genetic Covariance Matrices

    Get PDF
    Quantitative genetic studies that model complex, multivariate phenotypes are important for both evolutionary prediction and artificial selection. For example, changes in gene expression can provide insight into developmental and physiological mechanisms that link genotype and phenotype. However, classical analytical techniques are poorly suited to quantitative genetic studies of gene expression where the number of traits assayed per individual can reach many thousand. Here, we derive a Bayesian genetic sparse factor model for estimating the genetic covariance matrix (G-matrix) of high-dimensional traits, such as gene expression, in a mixed effects model. The key idea of our model is that we need only consider G-matrices that are biologically plausible. An organism's entire phenotype is the result of processes that are modular and have limited complexity. This implies that the G-matrix will be highly structured. In particular, we assume that a limited number of intermediate traits (or factors, e.g., variations in development or physiology) control the variation in the high-dimensional phenotype, and that each of these intermediate traits is sparse -- affecting only a few observed traits. The advantages of this approach are two-fold. First, sparse factors are interpretable and provide biological insight into mechanisms underlying the genetic architecture. Second, enforcing sparsity helps prevent sampling errors from swamping out the true signal in high-dimensional data. We demonstrate the advantages of our model on simulated data and in an analysis of a published Drosophila melanogaster gene expression data set.Comment: 35 pages, 7 figure
    corecore