35 research outputs found

    DNA barcode analysis: a comparison of phylogenetic and statistical classification methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species.</p> <p>Results</p> <p>No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci - nuclear genes - improved the predictive performance of most methods.</p> <p>Conclusion</p> <p>The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.</p

    Reconstructing past history from whole-genomes: an ABC approach handling recombining data.

    No full text
    International audienceIn population genetics, a key interest is to reconstruct the demographic history of a population using its genetic data. This history can be characterized by multiple events such as migration of individuals, admixture with another population, or changes in population size. With the availability of large-scale genomic data numerous methods have arisen for untangling complicated histories or retrieving a detailed picture of a population at different time periods.Although genomes are known to be extremely informative about demography, there are many ways to extract this information. We present an approach designed for inferring past population sizes for an intermediate number of fully sequenced genomes. It relies on Approximate Bayesian Computation (ABC), a simulation-based statistical framework for generic model comparison and parameter inference. We demonstrated how the specificities of DNA sequencing data (namely haplotypic information, long range genetic correlation and genotyping errors) can be handled using ABC and fast genetic simulators, and further infer histories of successive bottleneck and expansions in human populations

    Inference of past historical events using ABC and MCMC methods on population genomics data sets. Applications to human populations

    No full text
    International audienceNew computer-intensive estimation techniques such as Approximate Bayesian Computation (ABC) and Monte Carlo Markov chains (MCMC) allows inferring unknown parts of the history of species from contemporary population genetics and genomics data. I will illustrate these possibilities with several examples. In this context, we performed a study on worldwide human populations, in which by applying MCMC methods on a large set of populations with different lifestyles (farmers, herder and hunter-gatherers), we were able to show that these lifestyles strongly impacted the expansion patterns of these populations: farmers show strong expansion signals, herders weak expansion and hunter-gatherers no expansion at all. Moreover we showed that rapidly mutating markers like microsatellites allowed us to infer the recent Neolithic expansion, while slowly mutating markers like sequences allowed us to infer a more ancient Paleolithic expansion. The validity of this approach was verified through a simulation study.More recently, we developed a parametric ABC method for whole-genome sequencing data. We studied which combinations of summary statistics allow best to infer the demographic processes occurring in the populations under study. We applied this method to human populations from the 1000 Genome project. Our first results show that we can infer contrasted results when comparing Eurasian and African populations

    Reconstructing past history from whole-genomes: an ABC approach handling recombining data.

    No full text
    International audienceIn population genetics, a key interest is to reconstruct the demographic history of a population using its genetic data. This history can be characterized by multiple events such as migration of individuals, admixture with another population, or changes in population size. With the availability of large-scale genomic data numerous methods have arisen for untangling complicated histories or retrieving a detailed picture of a population at different time periods.Although genomes are known to be extremely informative about demography, there are many ways to extract this information. We present an approach designed for inferring past population sizes for an intermediate number of fully sequenced genomes. It relies on Approximate Bayesian Computation (ABC), a simulation-based statistical framework for generic model comparison and parameter inference. We demonstrated how the specificities of DNA sequencing data (namely haplotypic information, long range genetic correlation and genotyping errors) can be handled using ABC and fast genetic simulators, and further infer histories of successive bottleneck and expansions in human populations

    Data from: An evaluation of the methods to estimate effective population size from measures of linkage disequilibrium

    No full text
    In 1971, John Sved derived an approximate relationship between linkage disequilibrium and effective population size for an ideal finite population. This seminal work was extended by Sved and Feldman (1973) and Weir and Hill (1980) who derived additional equations with the same purpose. These equations yield useful estimates of effective population size, as they require a single sample in time. As these estimates of effective population size are now commonly used on a variety of genomic data, from arrays of single nucleotide polymorphisms to whole genome data, some authors have investigated their bias through simulation studies and proposed corrections for different mating systems. However, the cause of the bias remains elusive. Here we show the problems of using linkage disequilibrium as a statistical measure and, analogously, the problems in estimating effective population size from such measure. For that purpose, we compare three commonly used approaches with a transition probability based method that we develop here. It provides an exact computation of linkage disequilibrium. We show here that the bias in the estimates of linkage disequilibrium and effective population size are partly due to low frequency markers, tightly linked markers or to a small total number of crossovers per generation. These biases, however, do not decrease when increasing sample size or using unlinked markers. Our results show the issues of such measures of effective population based on linkage disequilibrium, and suggest which of the method here studied should be used in empirical studies as well as the optimal distance between markers for such estimates

    POLDISP: a software package for indirect estimation of contemporary pollen dispersal

    No full text
    International audiencePOLDSIP 1.0 is a free software package to estimate the distribution of pollen dispersal distances from mother-offspring diploid genotypic data. It requires the spatial coordinates and genotypes of a sample of seed plants and their respective maternal progenies, providing estimates of the average, variance and kurtosis of the pollen dispersal curve. POLDSIP also estimates the effective reproductive density of pollen donors and the correlation of paternity within and among maternal sibships. POLDSIP is useful for characterizing the spatial scale of pollen dispersal, for assessing the variation in male fertility and for investigating biological factors affecting correlated paternity in plants
    corecore