642 research outputs found

    Bayesian co-estimation of selfing rate and locus-specific mutation rates for a partially selfing population

    Full text link
    We present a Bayesian method for characterizing the mating system of populations reproducing through a mixture of self-fertilization and random outcrossing. Our method uses patterns of genetic variation across the genome as a basis for inference about pure hermaphroditism, androdioecy, and gynodioecy. We extend the standard coalescence model to accommodate these mating systems, accounting explicitly for multilocus identity disequilibrium, inbreeding depression, and variation in fertility among mating types. We incorporate the Ewens Sampling Formula (ESF) under the infinite-alleles model of mutation to obtain a novel expression for the likelihood of mating system parameters. Our Markov chain Monte Carlo (MCMC) algorithm assigns locus-specific mutation rates, drawn from a common mutation rate distribution that is itself estimated from the data using a Dirichlet Process Prior (DPP) model. Among the parameters jointly inferred are the population-wide rate of self-fertilization, locus-specific mutation rates, and the number of generations since the most recent outcrossing event for each sampled individual

    A Bayesian Approach to Inferring Rates of Selfing and Locus-Specific Mutation.

    Get PDF
    We present a Bayesian method for characterizing the mating system of populations reproducing through a mixture of self-fertilization and random outcrossing. Our method uses patterns of genetic variation across the genome as a basis for inference about reproduction under pure hermaphroditism, gynodioecy, and a model developed to describe the self-fertilizing killifish Kryptolebias marmoratus. We extend the standard coalescence model to accommodate these mating systems, accounting explicitly for multilocus identity disequilibrium, inbreeding depression, and variation in fertility among mating types. We incorporate the Ewens sampling formula (ESF) under the infinite-alleles model of mutation to obtain a novel expression for the likelihood of mating system parameters. Our Markov chain Monte Carlo (MCMC) algorithm assigns locus-specific mutation rates, drawn from a common mutation rate distribution that is itself estimated from the data using a Dirichlet process prior model. Our sampler is designed to accommodate additional information, including observations pertaining to the sex ratio, the intensity of inbreeding depression, and other aspects of reproduction. It can provide joint posterior distributions for the population-wide proportion of uniparental individuals, locus-specific mutation rates, and the number of generations since the most recent outcrossing event for each sampled individual. Further, estimation of all basic parameters of a given model permits estimation of functions of those parameters, including the proportion of the gene pool contributed by each sex and relative effective numbers

    Estimating genealogies from linked marker data: a Bayesian approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Answers to several fundamental questions in statistical genetics would ideally require knowledge of the ancestral pedigree and of the gene flow therein. A few examples of such questions are haplotype estimation, relatedness and relationship estimation, gene mapping by combining pedigree and linkage disequilibrium information, and estimation of population structure.</p> <p>Results</p> <p>We present a probabilistic method for genealogy reconstruction. Starting with a group of genotyped individuals from some population isolate, we explore the state space of their possible ancestral histories under our Bayesian model by using Markov chain Monte Carlo (MCMC) sampling techniques. The main contribution of our work is the development of sampling algorithms in the resulting vast state space with highly dependent variables. The main drawback is the computational complexity that limits the time horizon within which explicit reconstructions can be carried out in practice.</p> <p>Conclusion</p> <p>The estimates for IBD (identity-by-descent) and haplotype distributions are tested in several settings using simulated data. The results appear to be promising for a further development of the method.</p

    Population Structure

    Get PDF

    On inferring and interpreting genetic population structure - applications to conservation, and the estimation of pairwise genetic relatedness

    Get PDF
    The presence of population structure is ubiquitous in most wild populations of species. Detecting genetic population structure and understanding its consequences for the evolutionary trajectories of species has shaped a lot of our understanding of the process of evolution. This delineation of subdivision within a population plays an important role in several allied fields, including conservation genetics, association studies, phylogeography, and quantitative genetics. This dissertation addresses methods to infer and interpret subpopulation structure. In this regards, I discuss the standing motivation for developing new analytic tools, a classic population genetics study of the imperiled freshwater turtle, Emys blandingii, the development of a fast, likelihood based estimator of subpopulation structure, MULTICLUST, and a likelihood based method to infer pairwise genetic relatedness in the presence of subpopulation structure. Our analyses of population structure in midwestern populations of Emys blandingii detected considerable genetic structure within and among the sampled localities, and revealed ancestral gene flow of E. blandingii in this region north and east from an ancient refugium in the central Great Plains, concordant with post-glacial recolonization timescales. The data further implied unexpected links between geographically disparate populations in Nebraska and Illinois. Our study encourages conservation decisions to be mindful of the genetic uniqueness of populations of E. blandingii across its primary range. Analyses of both simulated and empirical data suggests that MULTICLUST infers structure consistently (reproducible results), and is time effcient, compared to the popular Bayesian MCMC tool, STRUCTURE (Pritchard et al. (2000b)). The new likelihood estimator of pairwise genetic relatedness also has the least bias, and mean squared error in estimating relatedness in full-sibling, half-sibling, parent-offspring, and a variety of other related dyads, compared to the methods of Anderson and Weir (2007), Queller and Goodnight (1989), Lynch and Ritland (1999). Overall, this dissertation lays the grounds for several interesting biological and statistical questions that can be addressed with a robust framework for identification of subpopulation structure

    Determination of genetic structure of germplasm collections: are traditional hierarchical clustering methods appropriate for molecular marker data?

    Get PDF
    Despite the availability of newer approaches, traditional hierarchical clustering remains very popular in genetic diversity studies in plants. However, little is known about its suitability for molecular marker data. We studied the performance of traditional hierarchical clustering techniques using real and simulated molecular marker data. Our study also compared the performance of traditional hierarchical clustering with model-based clustering (STRUCTURE). We showed that the cophenetic correlation coefficient is directly related to subgroup differentiation and can thus be used as an indicator of the presence of genetically distinct subgroups in germplasm collections. Whereas UPGMA performed well in preserving distances between accessions, Ward excelled in recovering groups. Our results also showed a close similarity between clusters obtained by Ward and by STRUCTURE. Traditional cluster analysis can provide an easy and effective way of determining structure in germplasm collections using molecular marker data, and, the output can be used for sampling core collections or for association studies

    Methods and Algorithms for Inference Problems in Population Genetics

    Get PDF
    Inference of population history is a central problem of population genetics. The advent of large genetic data brings us not only opportunities on developing more accurate methods for inference problems, but also computational challenges. Thus, we aim at developing accurate method and fast algorithm for problems in population genetics. Inference of admixture proportions is a classical statistical problem. We particularly focus on the problem of ancestry inference for ancestors. Standard methods implicitly assume that both parents of an individual have the same admixture fraction. However, this is rarely the case in real data. We develop a Hidden Markov Model (HMM) framework for estimating the admixture proportions of the immediate ancestors of an individual, i.e. a type of appropriation of an individual\u27s admixture proportions into further subsets of ancestral proportions in the ancestors. Based on a genealogical model for admixture tracts, we develop an efficient algorithm for computing the sampling probability of the genome from a single individual, as a function of the admixture proportions of the ancestors of this individual. We show that the distribution and lengths of admixture tracts in a genome contain information about the admixture proportions of the ancestors of an individual. This allows us to perform probabilistic inference of admixture proportions of ancestors only using the genome of an extant individual. To better understand population, we further study the species delimitation problem. It is a problem of determining the boundary between population and species. We propose a classification-based method to assign a set of populations to a number of species. Our new method uses summary statistics generated from genetic data to classify pairwise populations as either \u27same species\u27 or \u27different species\u27. We show that machine learning can be used for species delimitation and scaled for large genomic data. It can also outperform Bayesian approaches, especially when gene flow involves in the evolutionary process

    Multiallelic models of genetic effects and variance decomposition in non-equilibrium populations

    Get PDF
    Quantitative genetics stems from the theoretical models of genetic effects, which are re-parameterizations of the genotypic values into parameters of biological (genetic) relevance. Different formulations of genetic effects are adequate to address different subjects. We thus need to generalize and unify them under a common framework for enabling researchers to easily transform genetic effects between different biological meanings. The Natural and Orthogonal Interactions (NOIA) model of genetic effects has been developed to achieve this aim. Here, we further implement the statistical formulation of NOIA with multiple alleles under Hardy–Weinberg departures (HWD). We show that our developments are straightforwardly connected to the decomposition of the genetic variance and we point out several emergent properties of multiallelic quantitative genetic models, as compared to the biallelic ones. Further, NOIA entails a natural extension of one-locus developments to multiple epistatic loci under linkage equilibrium. Therefore, we present an extension of the orthogonal decomposition of the genetic variance to multiple epistatic, multiallelic loci under HWD. We illustrate this theory with a graphical interpretation and an analysis of published data on the human acid phosphatase (ACP1) polymorphismJAC acknowledges funding by an “Isidro Parga Pondal” contract from the autonomous administration Xunta de Galicia. This research has been partially supported by projects BFU2009-11988 and BFU2010-20003 form the Spanish Ministry of Science (JAC) and the Natural Sciences and Engineering Research Council of Canada, Grant OGP0183983 (RCY)S
    corecore