88 research outputs found

    MSMC and MSMC2: the multiple sequentially markovian coalescent

    Get PDF
    The Multiple Sequentially Markovian Coalescent (MSMC) is a population genetic method and software for inferring demographic history and population structure through time from genome sequences. Here we describe the main program MSMC and its successor MSMC2. We go through all the necessary steps of processing genomic data from BAM files all the way to generating plots of inferred population size and separation histories. Some background on the methodology itself is provided, as well as bash scripts and python source code to run the necessary programs. The reader is also referred to community resources such as a mailing list and github repositories for further advice

    A minimal descriptor of an ancestral recombinations graph

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ancestral Recombinations Graph (ARG) is a phylogenetic structure that encodes both duplication events, such as mutations, as well as genetic exchange events, such as recombinations: this captures the (genetic) dynamics of a population evolving over generations.</p> <p>Results</p> <p>In this paper, we identify structure-preserving and samples-preserving core of an ARG <it>G</it> and call it the minimal descriptor ARG of <it>G</it>. Its structure-preserving characteristic ensures that all the branch lengths of the marginal trees of the minimal descriptor ARG are identical to that of <it>G</it> and the samples-preserving property asserts that the patterns of genetic variation in the samples of the minimal descriptor ARG are exactly the same as that of <it>G</it>. We also prove that even an unbounded <it>G</it> has a finite minimal descriptor, that continues to preserve certain (graph-theoretic) properties of <it>G</it> and for an appropriate class of ARGs, our estimate (Eqn 8) as well as empirical observation is that the expected reduction in the number of vertices is exponential.</p> <p>Conclusions</p> <p>Based on the definition of this lossless and bounded structure, we derive local properties of the vertices of a minimal descriptor ARG, which lend itself very naturally to the design of efficient sampling algorithms. We further show that a class of minimal descriptors, that of binary ARGs, models the standard coalescent exactly (Thm 6).</p

    Genome-wide fine-scale recombination rate variation in Drosophila melanogaster

    Get PDF
    Estimating fine-scale recombination maps of Drosophila from population genomic data is a challenging problem, in particular because of the high background recombination rate. In this paper, a new computational method is developed to address this challenge. Through an extensive simulation study, it is demonstrated that the method allows more accurate inference, and exhibits greater robustness to the effects of natural selection and noise, compared to a well-used previous method developed for studying fine-scale recombination rate variation in the human genome. As an application, a genome-wide analysis of genetic variation data is performed for two Drosophila melanogaster populations, one from North America (Raleigh, USA) and the other from Africa (Gikongoro, Rwanda). It is shown that fine-scale recombination rate variation is widespread throughout the D. melanogaster genome, across all chromosomes and in both populations. At the fine-scale, a conservative, systematic search for evidence of recombination hotspots suggests the existence of a handful of putative hotspots each with at least a tenfold increase in intensity over the background rate. A wavelet analysis is carried out to compare the estimated recombination maps in the two populations and to quantify the extent to which recombination rates are conserved. In general, similarity is observed at very broad scales, but substantial differences are seen at fine scales. The average recombination rate of the X chromosome appears to be higher than that of the autosomes in both populations, and this pattern is much more pronounced in the African population than the North American population. The correlation between various genomic features—including recombination rates, diversity, divergence, GC content, gene content, and sequence quality—is examined using the wavelet analysis, and it is shown that the most notable difference between D. melanogaster and humans is in the correlation between recombination and diversity

    Forward-time simulation of realistic samples for genome-wide association studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Forward-time simulations have unique advantages in power and flexibility for the simulation of genetic samples of complex human diseases because they can closely mimic the evolution of human populations carrying these diseases. However, a number of methodological and computational constraints have prevented the power of this simulation method from being fully explored in existing forward-time simulation methods.</p> <p>Results</p> <p>Using a general-purpose forward-time population genetics simulation environment, we developed a forward-time simulation method that can be used to simulate realistic samples for genome-wide association studies. We examined the properties of this simulation method by comparing simulated samples with real data and demonstrated its wide applicability using four examples, including a simulation of case-control samples with a disease caused by multiple interacting genetic and environmental factors, a simulation of trio families affected by a disease-predisposing allele that had been subjected to either slow or rapid selective sweep, and a simulation of a structured population resulting from recent population admixture.</p> <p>Conclusions</p> <p>Our algorithm simulates populations that closely resemble the complex structure of the human genome, while allows the introduction of signals of natural selection. Because of its flexibility to generate different types of samples with arbitrary disease or quantitative trait models, this simulation method can simulate realistic samples to evaluate the performance of a wide variety of statistical gene mapping methods for genome-wide association studies.</p

    A New Method to Reconstruct Recombination Events at a Genomic Scale

    Get PDF
    Recombination is one of the main forces shaping genome diversity, but the information it generates is often overlooked. A recombination event creates a junction between two parental sequences that may be transmitted to the subsequent generations. Just like mutations, these junctions carry evidence of the shared past of the sequences. We present the IRiS algorithm, which detects past recombination events from extant sequences and specifies the place of each recombination and which are the recombinants sequences. We have validated and calibrated IRiS for the human genome using coalescent simulations replicating standard human demographic history and a variable recombination rate model, and we have fine-tuned IRiS parameters to simultaneously optimize for false discovery rate, sensitivity, and accuracy in placing the recombination events in the sequence. Newer recombinations overwrite traces of past ones and our results indicate more recent recombinations are detected by IRiS with greater sensitivity. IRiS analysis of the MS32 region, previously studied using sperm typing, showed good concordance with estimated recombination rates. We also applied IRiS to haplotypes for 18 X-chromosome regions in HapMap Phase 3 populations. Recombination events detected for each individual were recoded as binary allelic states and combined into recotypes. Principal component analysis and multidimensional scaling based on recotypes reproduced the relationships between the eleven HapMap Phase III populations that can be expected from known human population history, thus further validating IRiS. We believe that our new method will contribute to the study of the distribution of recombination events across the genomes and, for the first time, it will allow the use of recombination as genetic marker to study human genetic variation

    GENOMEPOP: A program to simulate genomes in populations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There are several situations in population biology research where simulating DNA sequences is useful. Simulation of biological populations under different evolutionary genetic models can be undertaken using backward or forward strategies. Backward simulations, also called coalescent-based simulations, are computationally efficient. The reason is that they are based on the history of lineages with surviving offspring in the current population. On the contrary, forward simulations are less efficient because the entire population is simulated from past to present. However, the coalescent framework imposes some limitations that forward simulation does not. Hence, there is an increasing interest in forward population genetic simulation and efficient new tools have been developed recently. Software tools that allow efficient simulation of large DNA fragments under complex evolutionary models will be very helpful when trying to better understand the trace left on the DNA by the different interacting evolutionary forces. Here I will introduce GenomePop, a forward simulation program that fulfills the above requirements. The use of the program is demonstrated by studying the impact of intracodon recombination on global and site-specific <it>dN/dS </it>estimation.</p> <p>Results</p> <p>I have developed algorithms and written software to efficiently simulate, forward in time, different Markovian nucleotide or codon models of DNA mutation. Such models can be combined with recombination, at inter and intra codon levels, fitness-based selection and complex demographic scenarios.</p> <p>Conclusion</p> <p>GenomePop has many interesting characteristics for simulating SNPs or DNA sequences under complex evolutionary and demographic models. These features make it unique with respect to other simulation tools. Namely, the possibility of forward simulation under General Time Reversible (GTR) mutation or GTR×MG94 codon models with intra-codon recombination, arbitrary, user-defined, migration patterns, diploid or haploid models, constant or variable population sizes, etc. It also allows simulation of fitness-based selection under different distributions of mutational effects. Under the 2-allele model it allows the simulation of recombination hot-spots, the definition of different frequencies in different populations, etc. GenomePop can also manage large DNA fragments. In addition, it has a scaling option to save computation time when simulating large sequences and population sizes under complex demographic and evolutionary situations. These and many other features are detailed in its web page <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p

    Genotype, haplotype and copy-number variation in worldwide human populations

    Full text link
    Genome-wide patterns of variation across individuals provide a powerful source of data for uncovering the history of migration, range expansion, and adaptation of the human species. However, high-resolution surveys of variation in genotype, haplotype and copy number have generally focused on a small number of population groups(1-3). Here we report the analysis of high-quality genotypes at 525,910 single-nucleotide polymorphisms ( SNPs) and 396 copy-number-variable loci in a worldwide sample of 29 populations. Analysis of SNP genotypes yields strongly supported fine-scale inferences about population structure. Increasing linkage disequilibrium is observed with increasing geographic distance from Africa, as expected under a serial founder effect for the out-of-Africa spread of human populations. New approaches for haplotype analysis produce inferences about population structure that complement results based on unphased SNPs. Despite a difference from SNPs in the frequency spectrum of the copy-number variants (CNVs) detected-including a comparatively large number of CNVs in previously unexamined populations from Oceania and the Americas-the global distribution of CNVs largely accords with population structure analyses for SNP data sets of similar size. Our results produce new inferences about inter-population variation, support the utility of CNVs in human population-genetic research, and serve as a genomic resource for human-genetic studies in diverse worldwide populations.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/62552/1/nature06742.pd

    Genetic variation and linkage disequilibrium in Bacillus anthracis

    Get PDF
    We performed whole-genome amplification followed by hybridization of custom-designed resequencing arrays to resequence 303 kb of genomic sequence from a worldwide panel of 39 Bacillus anthracis strains. We used an efficient algorithm contained within a custom software program, UniqueMER, to identify and mask repetitive sequences on the resequencing array to reduce false-positive identification of genetic variation, which can arise from cross-hybridization. We discovered a total of 240 single nucleotide variants (SNVs) and showed that B. anthracis strains have an average of 2.25 differences per 10,000 bases in the region we resequenced. Common SNVs in this region are found to be in complete linkage disequilibrium. These patterns of variation suggest there has been little if any historical recombination among B. anthracis strains since the origin of the pathogen. This pattern of common genetic variation suggests a framework for recognizing new or genetically engineered strains

    Multiple Chromosomal Rearrangements Structured the Ancestral Vertebrate Hox-Bearing Protochromosomes

    Get PDF
    While the proposal that large-scale genome expansions occurred early in vertebrate evolution is widely accepted, the exact mechanisms of the expansion—such as a single or multiple rounds of whole genome duplication, bloc chromosome duplications, large-scale individual gene duplications, or some combination of these—is unclear. Gene families with a single invertebrate member but four vertebrate members, such as the Hox clusters, provided early support for Ohno's hypothesis that two rounds of genome duplication (the 2R-model) occurred in the stem lineage of extant vertebrates. However, despite extensive study, the duplication history of the Hox clusters has remained unclear, calling into question its usefulness in resolving the role of large-scale gene or genome duplications in early vertebrates. Here, we present a phylogenetic analysis of the vertebrate Hox clusters and several linked genes (the Hox “paralogon”) and show that different phylogenies are obtained for Dlx and Col genes than for Hox and ErbB genes. We show that these results are robust to errors in phylogenetic inference and suggest that these competing phylogenies can be resolved if two chromosomal crossover events occurred in the ancestral vertebrate. These results resolve conflicting data on the order of Hox gene duplications and the role of genome duplication in vertebrate evolution and suggest that a period of genome reorganization occurred after genome duplications in early vertebrates
    corecore