Search CORE

88 research outputs found

MSMC and MSMC2: the multiple sequentially markovian coalescent

Author: 1001 Genomes Consortium
AS Malaspinas
CM Hung
GAT McVean
L Pagani
L Pagani
LAF Frantz
M Malinsky
M. Raghavan
P Marjoram
S Mallick
S Schiffels
TM Beissinger
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

The Multiple Sequentially Markovian Coalescent (MSMC) is a population genetic method and software for inferring demographic history and population structure through time from genome sequences. Here we describe the main program MSMC and its successor MSMC2. We go through all the necessary steps of processing genomic data from BAM files all the way to generating plots of inferred population size and separation histories. Some background on the methodology itself is provided, as well as bash scripts and python source code to run the necessary programs. The reader is also referred to community resources such as a mailing list and github repositories for further advice

Crossref

MPG.PuRe

A minimal descriptor of an ancestral recombinations graph

Author: Asif Javed
B Padhukasahasram
C Wiuf
GAT McVean
GK Chen
J Hein
L L Liang
L Parida
L Parida
Laxmi Parida
M Arenas
M Jobling
P Marjoram
Pier Francesco Palamara
R Bürger
RC Griffiths
RR Hudson
RR Hudson
S Schaffner
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Ancestral Recombinations Graph (ARG) is a phylogenetic structure that encodes both duplication events, such as mutations, as well as genetic exchange events, such as recombinations: this captures the (genetic) dynamics of a population evolving over generations. Results In this paper, we identify structure-preserving and samples-preserving core of an ARG <it>G</it> and call it the minimal descriptor ARG of <it>G</it>. Its structure-preserving characteristic ensures that all the branch lengths of the marginal trees of the minimal descriptor ARG are identical to that of <it>G</it> and the samples-preserving property asserts that the patterns of genetic variation in the samples of the minimal descriptor ARG are exactly the same as that of <it>G</it>. We also prove that even an unbounded <it>G</it> has a finite minimal descriptor, that continues to preserve certain (graph-theoretic) properties of <it>G</it> and for an appropriate class of ARGs, our estimate (Eqn 8) as well as empirical observation is that the expected reduction in the number of vertices is exponential. Conclusions Based on the definition of this lossless and bounded structure, we derive local properties of the vertices of a minimal descriptor ARG, which lend itself very naturally to the design of efficient sampling algorithms. We further show that a class of minimal descriptors, that of binary ARGs, models the standard coalescent exactly (Thm 6).</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Genome-wide fine-scale recombination rate variation in Drosophila melanogaster

Estimating fine-scale recombination maps of Drosophila from population genomic data is a challenging problem, in particular because of the high background recombination rate. In this paper, a new computational method is developed to address this challenge. Through an extensive simulation study, it is demonstrated that the method allows more accurate inference, and exhibits greater robustness to the effects of natural selection and noise, compared to a well-used previous method developed for studying fine-scale recombination rate variation in the human genome. As an application, a genome-wide analysis of genetic variation data is performed for two Drosophila melanogaster populations, one from North America (Raleigh, USA) and the other from Africa (Gikongoro, Rwanda). It is shown that fine-scale recombination rate variation is widespread throughout the D. melanogaster genome, across all chromosomes and in both populations. At the fine-scale, a conservative, systematic search for evidence of recombination hotspots suggests the existence of a handful of putative hotspots each with at least a tenfold increase in intensity over the background rate. A wavelet analysis is carried out to compare the estimated recombination maps in the two populations and to quantify the extent to which recombination rates are conserved. In general, similarity is observed at very broad scales, but substantial differences are seen at fine scales. The average recombination rate of the X chromosome appears to be higher than that of the autosomes in both populations, and this pattern is much more pronounced in the African population than the North American population. The correlation between various genomic features—including recombination rates, diversity, divergence, GC content, gene content, and sequence quality—is examined using the wavelet analysis, and it is shown that the most notable difference between D. melanogaster and humans is in the correlation between recombination and diversity

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Warwick Research Archives Portal Repository

FigShare

Genomics of Divergence along a Continuum of Parapatric Population Differentiation

MM received funding from the Max Planck innovation funds for this project. PGDF was supported by a Marie Curie European Reintegration Grant (proposal nr 270891). CE was supported by German Science Foundation grants (DFG, EI 841/4-1 and EI 841/6-1)

OceanRep

Crossref

Directory of Open Access Journals

PubMed Central

Queen Mary Research Online

Bern Open Repository and Information System (BORIS)

MPG.PuRe

Forward-time simulation of realistic samples for genome-wide association studies

Author: A Carvajal-Rodriguez
A Carvajal-Rodriguez
AL Price
B Devlin
B Peng
B Peng
B Peng
B Peng
B Weir
BF Voight
Bo Peng
BW Lambert
C Li
C Pfaff
CC Spencer
CC Spencer
CC Wu
Christopher I Amos
CI Amos
CI Amos
CJ Hoggart
D Altshuler
D Li
D Reich
E Lander
FA Wright
G Ayodo
G McVean
GA McVean
GAT McVean
GK Chen
H Tang
HS Chai
HY Tan
J Marchini
J Wise
JC Barrett
JC Long
JD Wall
JK Pritchard
JK Pritchard
L Liang
M Chadeau-Hyam
M Kimura
M Li
M Slatkin
M Slatkin
MI McCarthy
MW Smith
P Marjoram
PC Sham
RR Hudson
S Myers
S Wiltshire
S Zollner
T Mailund
T Mehta
TH Consortia
W Knowler
WJ Ewens
X Zhu
Y Wang
Z Bochdanovits
Publication venue: BioMed Central
Publication date: 01/09/2010
Field of study

Abstract Background Forward-time simulations have unique advantages in power and flexibility for the simulation of genetic samples of complex human diseases because they can closely mimic the evolution of human populations carrying these diseases. However, a number of methodological and computational constraints have prevented the power of this simulation method from being fully explored in existing forward-time simulation methods. Results Using a general-purpose forward-time population genetics simulation environment, we developed a forward-time simulation method that can be used to simulate realistic samples for genome-wide association studies. We examined the properties of this simulation method by comparing simulated samples with real data and demonstrated its wide applicability using four examples, including a simulation of case-control samples with a disease caused by multiple interacting genetic and environmental factors, a simulation of trio families affected by a disease-predisposing allele that had been subjected to either slow or rapid selective sweep, and a simulation of a structured population resulting from recent population admixture. Conclusions Our algorithm simulates populations that closely resemble the complex structure of the human genome, while allows the introduction of signals of natural selection. Because of its flexibility to generate different types of samples with arbitrary disease or quantitative trait models, this simulation method can simulate realistic samples to evaluate the performance of a wide variety of statistical gene mapping methods for genome-wide association studies.</p

Crossref

Directory of Open Access Journals

PubMed Central

A New Method to Reconstruct Recombination Events at a Genomic Scale

Author: A Auton
AJ Jeffreys
Asif Javed
Chris P. Ponting
D Posada
DC Crawford
ED Parvanov
EO Wilson
F Baudat
Francesc Calafell
GAT McVean
J Felsenstein
J Rozas
Jaume Bertranpetit
JZ Li
K Paigen
K Sturrock
KK Kidd
L Excoffier
L Parida
L Parida
Laxmi Parida
M Jakobsson
M Stephens
M Stephens
Marc Pybus
Marta Melé
N Li
NA Rosenberg
P Scheet
RA Fisher
RR Hudson
S Myers
S Myers
SF Schaffner
SJE Baird
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Recombination is one of the main forces shaping genome diversity, but the information it generates is often overlooked. A recombination event creates a junction between two parental sequences that may be transmitted to the subsequent generations. Just like mutations, these junctions carry evidence of the shared past of the sequences. We present the IRiS algorithm, which detects past recombination events from extant sequences and specifies the place of each recombination and which are the recombinants sequences. We have validated and calibrated IRiS for the human genome using coalescent simulations replicating standard human demographic history and a variable recombination rate model, and we have fine-tuned IRiS parameters to simultaneously optimize for false discovery rate, sensitivity, and accuracy in placing the recombination events in the sequence. Newer recombinations overwrite traces of past ones and our results indicate more recent recombinations are detected by IRiS with greater sensitivity. IRiS analysis of the MS32 region, previously studied using sperm typing, showed good concordance with estimated recombination rates. We also applied IRiS to haplotypes for 18 X-chromosome regions in HapMap Phase 3 populations. Recombination events detected for each individual were recoded as binary allelic states and combined into recotypes. Principal component analysis and multidimensional scaling based on recotypes reproduced the relationships between the eleven HapMap Phase III populations that can be expected from known human population history, thus further validating IRiS. We believe that our new method will contribute to the study of the distribution of recombination events across the genomes and, for the first time, it will allow the use of recombination as genetic marker to study human genetic variation

Crossref

Directory of Open Access Journals

PubMed Central

UPF Digital Repository

ScholarlyCommons@Penn

Digital.CSIC

GENOMEPOP: A program to simulate genomes in populations

Abstract Background There are several situations in population biology research where simulating DNA sequences is useful. Simulation of biological populations under different evolutionary genetic models can be undertaken using backward or forward strategies. Backward simulations, also called coalescent-based simulations, are computationally efficient. The reason is that they are based on the history of lineages with surviving offspring in the current population. On the contrary, forward simulations are less efficient because the entire population is simulated from past to present. However, the coalescent framework imposes some limitations that forward simulation does not. Hence, there is an increasing interest in forward population genetic simulation and efficient new tools have been developed recently. Software tools that allow efficient simulation of large DNA fragments under complex evolutionary models will be very helpful when trying to better understand the trace left on the DNA by the different interacting evolutionary forces. Here I will introduce GenomePop, a forward simulation program that fulfills the above requirements. The use of the program is demonstrated by studying the impact of intracodon recombination on global and site-specific <it>dN/dS </it>estimation. Results I have developed algorithms and written software to efficiently simulate, forward in time, different Markovian nucleotide or codon models of DNA mutation. Such models can be combined with recombination, at inter and intra codon levels, fitness-based selection and complex demographic scenarios. Conclusion GenomePop has many interesting characteristics for simulating SNPs or DNA sequences under complex evolutionary and demographic models. These features make it unique with respect to other simulation tools. Namely, the possibility of forward simulation under General Time Reversible (GTR) mutation or GTR×MG94 codon models with intra-codon recombination, arbitrary, user-defined, migration patterns, diploid or haploid models, constant or variable population sizes, etc. It also allows simulation of fitness-based selection under different distributions of mutational effects. Under the 2-allele model it allows the simulation of recombination hot-spots, the definition of different frequencies in different populations, etc. GenomePop can also manage large DNA fragments. In addition, it has a scaling option to save computation time when simulating large sequences and population sizes under complex demographic and evolutionary situations. These and many other features are detailed in its web page <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Genotype, haplotype and copy-number variation in worldwide human populations

Author: AC Need
AJ Sharp
Andrew B. Singleton
Angela Britton
B Servin
Bryan J. Traynor
C Sabatti
D Falush
DA Hinds
DE Reich
Dena G. Hernandez
DF Conrad
DP Locke
GAT McVean
HM Cann
Hon-Chung Fung
Howard M. Cann
Ian Rafferty
J Zhang
J. Raphael Gibbs
James H. Degnan
Javier Simon-Sanchez
Jenna M. VanLiere
Jennifer C. Schymick
John A. Hardy
Jose M. Bras
Joyce van de Leemput
K Wang
Kai Wang
KK Wong
L Bastos-Rodrigues
LJ Lawson Handley
M Jakobsson
MA Eberle
Maja Bucan
Mar Matarin
Mattias Jakobsson
NA Rosenberg
NA Rosenberg
Noah A. Rosenberg
P Scheet
Paul Scheet
R Redon
Rita Guerreiro
S Ramachandran
SA Tishkoff
SA Tishkoff
SB Gabriel
Sonja W. Scholz
ST Kalinowski
SW Scherer
T Bersaglieri
Zachary A. Szpiech
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/02/2008
Field of study

Genome-wide patterns of variation across individuals provide a powerful source of data for uncovering the history of migration, range expansion, and adaptation of the human species. However, high-resolution surveys of variation in genotype, haplotype and copy number have generally focused on a small number of population groups(1-3). Here we report the analysis of high-quality genotypes at 525,910 single-nucleotide polymorphisms ( SNPs) and 396 copy-number-variable loci in a worldwide sample of 29 populations. Analysis of SNP genotypes yields strongly supported fine-scale inferences about population structure. Increasing linkage disequilibrium is observed with increasing geographic distance from Africa, as expected under a serial founder effect for the out-of-Africa spread of human populations. New approaches for haplotype analysis produce inferences about population structure that complement results based on unphased SNPs. Despite a difference from SNPs in the frequency spectrum of the copy-number variants (CNVs) detected-including a comparatively large number of CNVs in previously unexamined populations from Oceania and the Americas-the global distribution of CNVs largely accords with population structure analyses for SNP data sets of similar size. Our results produce new inferences about inter-population variation, support the utility of CNVs in human population-genetic research, and serve as a genomic resource for human-genetic studies in diverse worldwide populations.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/62552/1/nature06742.pd

Crossref

Deep Blue Documents at the University of Michigan

Genetic variation and linkage disequilibrium in Bacillus anthracis

Author: A Fasanella
A Fouet
AM Mironczuk
AT Kovacs
B Janes
DJ Cutler
E Helgason
EP Rocha
F Tajima
FG Priest
GA Watterson
GAT McVean
H Akashi
J Maynard Smith
JC Barrett
JD Thompson
JP Gomes
KA Jolley
KE Holt
KL Smith
KL Smith
KS Ko
L Radnedge
LB Price
M Achtman
M Touchon
MCJ Maiden
ME Zwick
ME Zwick
MN Van Ert
P Keim
P Keim
PE Chen
PJ Jackson
PJ Jackson
R Hershberg
R Sachidanandam
RR Hudson
S Kryazhimskiy
S Myers
S Suerbaum
SA Sawyer
T Wirth
TD Read
TD Read
X Didelot
X Didelot
X Didelot
Y Tanabe
Publication venue: Nature Publishing Group
Publication date
Field of study

We performed whole-genome amplification followed by hybridization of custom-designed resequencing arrays to resequence 303 kb of genomic sequence from a worldwide panel of 39 Bacillus anthracis strains. We used an efficient algorithm contained within a custom software program, UniqueMER, to identify and mask repetitive sequences on the resequencing array to reduce false-positive identification of genetic variation, which can arise from cross-hybridization. We discovered a total of 240 single nucleotide variants (SNVs) and showed that B. anthracis strains have an average of 2.25 differences per 10,000 bases in the region we resequenced. Common SNVs in this region are found to be in complete linkage disequilibrium. These patterns of variation suggest there has been little if any historical recombination among B. anthracis strains since the origin of the pathogen. This pattern of common genetic variation suggests a framework for recognizing new or genetically engineered strains

Crossref

PubMed Central

Multiple Chromosomal Rearrangements Structured the Ancestral Vertebrate Hox-Bearing Protochromosomes

Author: A Martin
A McLysaght
A Meyer
AH Neidert
AL Evans
AL Hufton
AL Hughes
AL Hughes
AL Hughes
AL Hughes
BP Chowdhary
BR Holland
C Kappen
C Popovici
D Larhammar
DE Ferrier
DW Stock
F van der Hoeven
GAT McVean
GP Wagner
Günter P. Wagner
H Kishino
H Shimodaira
H Shimodaira
H Shimodaira
J Bergsten
J Kim
J Spring
J Zhang
JP Huelsenbeck
KD Crow
LG Lundin
M Anisimova
M Holder
M Kohn
M Sémon
N Goldman
O Pontes
P Dehal
R Friedman
R Friedman
R Furlong
R Guigo
R Phillips
RC Edgar
RC Edgar
S Guindon
S Ohno
SG Gregory
T Keane
T Marques-Bonet
Takashi Gojobori
V Lynch
Vincent J. Lynch
WJ Bailey
WJ Murphy
X Gu
Y Nakatani
Y Wang
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

While the proposal that large-scale genome expansions occurred early in vertebrate evolution is widely accepted, the exact mechanisms of the expansion—such as a single or multiple rounds of whole genome duplication, bloc chromosome duplications, large-scale individual gene duplications, or some combination of these—is unclear. Gene families with a single invertebrate member but four vertebrate members, such as the Hox clusters, provided early support for Ohno's hypothesis that two rounds of genome duplication (the 2R-model) occurred in the stem lineage of extant vertebrates. However, despite extensive study, the duplication history of the Hox clusters has remained unclear, calling into question its usefulness in resolving the role of large-scale gene or genome duplications in early vertebrates. Here, we present a phylogenetic analysis of the vertebrate Hox clusters and several linked genes (the Hox “paralogon”) and show that different phylogenies are obtained for Dlx and Col genes than for Hox and ErbB genes. We show that these results are robust to errors in phylogenetic inference and suggest that these competing phylogenies can be resolved if two chromosomal crossover events occurred in the ancestral vertebrate. These results resolve conflicting data on the order of Hox gene duplications and the role of genome duplication in vertebrate evolution and suggest that a period of genome reorganization occurred after genome duplications in early vertebrates

Crossref

Directory of Open Access Journals

PubMed Central