30 research outputs found
Subdividing Y-chromosome haplogroup R1a1 reveals Norse Viking dispersal lineages in Britain
The influence of Viking-Age migrants to the British Isles is obvious in archaeological and place-names evidence, but their demographic impact has been unclear. Autosomal genetic analyses support Norse Viking contributions to parts of Britain, but show no signal corresponding to the Danelaw, the region under Scandinavian administrative control from the ninth to eleventh centuries. Y-chromosome haplogroup R1a1 has been considered as a possible marker for Viking migrations because of its high frequency in peninsular Scandinavia (Norway and Sweden). Here we select ten Y-SNPs to discriminate informatively among hg R1a1 sub-haplogroups in Europe, analyse these in 619 hg R1a1 Y chromosomes including 163 from the British Isles, and also type 23 short-tandem repeats (Y-STRs) to assess internal diversity. We find three specifically Western-European sub-haplogroups, two of which predominate in Norway and Sweden, and are also found in Britain; starlike features in the STR networks of these lineages indicate histories of expansion. We ask whether geographical distributions of hg R1a1 overall, and of the two sub-lineages in particular, correlate with regions of Scandinavian influence within Britain. Neither shows any frequency difference between regions that have higher (≥10%) or lower autosomal contributions from Norway and Sweden, but both are significantly overrepresented in the region corresponding to the Danelaw. These differences between autosomal and Y-chromosomal histories suggest either male-specific contribution, or the influence of patrilocality. Comparison of modern DNA with recently available ancient DNA data supports the interpretation that two sub-lineages of hg R1a1 spread with the Vikings from peninsular Scandinavia
The Y-Chromosome Tree Bursts into Leaf: 13,000 High-Confidence SNPs Covering the Majority of Known Clades
Many studies of human populations have used the male-specific region of the Y chromosome (MSY) as a marker, but MSY sequence variants have traditionally been subject to ascertainment bias. Also, dating of haplogroups has relied on Y-specific short tandem repeats (STRs), involving problems of mutation rate choice, and possible long-term mutation saturation. Next-generation sequencing can ascertain single nucleotide polymorphisms (SNPs) in an unbiased way, leading to phylogenies in which branch-lengths are proportional to time, and allowing the times-to-most-recent-common-ancestor (TMRCAs) of nodes to be estimated directly. Here we describe the sequencing of 3.7 Mb of MSY in each of 448 human males at a mean coverage of 51x, yielding 13,261 high-confidence SNPs, 65.9% of which are previously unreported. The resulting phylogeny covers the majority of the known clades, provides date estimates of nodes, and constitutes a robust evolutionary framework for analyzing the history of other classes of mutation. Different clades within the tree show subtle but significant differences in branch lengths to the root. We also apply a set of 23 Y-STRs to the same samples, allowing SNP- and STR-based diversity and TMRCA estimates to be systematically compared. Ongoing purifying selection is suggested by our analysis of the phylogenetic distribution of nonsynonymous variants in 15 MSY single-copy genes
In the blood: the myth and reality of genetic markers of identity
The differences between copies of the human genome are very small, but tend to cluster in different populations. So, despite the fact that low inter-population differentiation does not support a biological definition of races statistical methods are nonetheless claimed to be able to predict successfully the population of origin of a DNA sample. Such methods are employed in commercial genetic ancestry tests, and particular genetic signatures, often in the male-specific Y-chromosome or maternally-inherited mitochondrial DNA, have become widely identified with particular ancestral or existing groups, such as Vikings, Jews, or Zulus. Here, we provide a primer on genetics, and describe how genetic markers have become associated with particular groups. We describe the conflict between population genetics and individual-based genetics and the pitfalls of over-simplistic genetic interpretations, arguing that although the tests themselves are reliable, the interpretations are unreliable and strongly influenced by cultural and other social forces.</p
A global analysis of Y-chromosomal haplotype diversity for 23 STR loci
In a worldwide collaborative effort, 19,630 Y-chromosomes were sampled from 129 different populations in 51 countries. These chromosomes were typed for 23 short-tandem repeat (STR) loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385ab, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, GATAH4, DYS481, DYS533, DYS549, DYS570, DYS576, and DYS643) and using the PowerPlex Y23 System (PPY23, Promega Corporation, Madison, WI). Locus-specific allelic spectra of these markers were determined and a consistently high level of allelic diversity was observed. A considerable number of null, duplicate and off-ladder alleles were revealed. Standard single-locus and haplotype-based parameters were calculated and compared between subsets of Y-STR markers established for forensic casework. The PPY23 marker set provides substantially stronger discriminatory power than other available kits but at the same time reveals the same general patterns of population structure as other marker sets. A strong correlation was observed between the number of Y-STRs included in a marker set and some of the forensic parameters under study. Interestingly a weak but consistent trend toward smaller genetic distances resulting from larger numbers of markers became apparent.Peer reviewe
In the blood:the myth and reality of genetic markers of identity
The differences between copies of the human genome are very small, but tend to cluster in different populations. So, despite the fact that low inter-population differentiation does not support a biological definition of races statistical methods are nonetheless claimed to be able to predict successfully the population of origin of a DNA sample. Such methods are employed in commercial genetic ancestry tests, and particular genetic signatures, often in the male-specific Y-chromosome or maternally-inherited mitochondrial DNA, have become widely identified with particular ancestral or existing groups, such as Vikings, Jews, or Zulus. Here, we provide a primer on genetics, and describe how genetic markers have become associated with particular groups. We describe the conflict between population genetics and individual-based genetics and the pitfalls of over-simplistic genetic interpretations, arguing that although the tests themselves are reliable, the interpretations are unreliable and strongly influenced by cultural and other social forces
In the blood: the myth and reality of genetic markers of identity
The differences between copies of the human genome are very small, but tend to cluster in different populations. So, despite the fact that low inter-population differentiation does not support a biological definition of races statistical methods are nonetheless claimed to be able to predict successfully the population of origin of a DNA sample. Such methods are employed in commercial genetic ancestry tests, and particular genetic signatures, often in the male-specific Y-chromosome or maternally-inherited mitochondrial DNA, have become widely identified with particular ancestral or existing groups, such as Vikings, Jews, or Zulus. Here, we provide a primer on genetics, and describe how genetic markers have become associated with particular groups. We describe the conflict between population genetics and individual-based genetics and the pitfalls of over-simplistic genetic interpretations, arguing that although the tests themselves are reliable, the interpretations are unreliable and strongly influenced by cultural and other social forces
A phylogenetic framework facilitates Y-STR variant discovery and classification via massively parallel sequencing
Short-tandem repeats on the male-specific region of the Y chromosome (Y-STRs) are permanently linked as haplotypes, and therefore Y-STR sequence diversity can be considered within the robust framework of a phylogeny of haplogroups defined by single-nucleotide polymorphisms (SNPs). Here we use massively parallel sequencing (MPS) to analyse the 23 Y-STRs in Promega’s prototype PowerSeqÔ Auto/Mito/Y System kit (containing the markers of the PowerPlex® Y23 [PPY23] System) in a set of 100 diverse Y chromosomes whose phylogenetic relationships are known from previous megabase-scale resequencing. Including allele duplications and alleles resulting from likely somatic mutation, we characterised 2311 alleles, demonstrating 99.83% concordance with capillary electrophoresis (CE) data on the same sample set. The set contains 267 distinct sequence-based alleles (an increase of 58% compared to the 169 detectable by CE), including 60 novel Y-STR variants phased with their flanking sequences which have not been reported previously to our knowledge. Variation includes 46 distinct alleles containing non-reference variants of SNPs/indels in both repeat and flanking regions, and 145 distinct alleles containing repeat pattern variants (RPV). For DYS385a,b, DYS481 and DYS390 we observed repeat count variation in short flanking segments previously considered invariable, and suggest new MPS-based structural designations based on these. We considered the observed variation in the context of the Y phylogeny: several specific haplogroup associations were observed for SNPs and indels, reflecting the low mutation rates of such variant types; however, RPVs showed less phylogenetic coherence and more recurrence, reflecting their relatively high mutation rates. In conclusion, our study reveals considerable additional diversity at the Y-STRs of the PPY23 set via MPS analysis, demonstrates high concordance with CE data, facilitates nomenclature standardisation, and places Y-STR sequence variants in their phylogenetic context
Extensive geographical and social structure in the paternal lineages of Saudi Arabia revealed by analysis of 27 Y-STRs.
Saudi Arabia's indigenous population is organized into patrilineal descent groups, but to date, little has been done to characterize its population structure, in particular with respect to the male-specific region of the Y chromosome. We have used the 27-STR Yfiler® Plus kit to generate haplotypes in 597 unrelated Saudi males, classified into five geographical regions (North, South, Central, East and West). Overall, Yfiler® Plus provides a good discrimination capacity of 95.3%, but this is greatly reduced (74.7%) when considering the reduced Yfiler® set of 17 Y-STRs, justifying the use of the expanded set of markers in this population. Comparison of the five geographical divisions reveals striking differences, with low diversity and similar haplotype spectra in the Central and Northern regions, and high diversity and similar haplotype spectra in the East and West. These patterns likely reflect the geographical isolation of the desert heartland of the peninsula, and the proximity to the sea of the Eastern and Western areas, and consequent historical immigration. We predicted haplogroups from Y-STR haplotypes, testing the performance of prediction by using a large independent set of Saudi Arabian Y-STR+Y-SNP data. Prediction indicated predominance (71%) of haplogroup J1, which was significantly more common in Central, Northern and Southern groups than in East and West, and formed a star-like expansion cluster in a median-joining network with an estimated age of ∼2800 years. Most of our 597 participants were sampled within Saudi Arabia itself, but ∼16% were sampled in the UK. Despite matching these two groups by home sub-region, we observed significant differences in haplotype and predicted haplogroup constitutions overall, and for most sub-regions individually. This suggests social structure influencing the probability of leaving Saudi Arabia, correlated with different Y-chromosome compositions. The UK-recruited sample is an inappropriate proxy for Saudi Arabia generally, and caution is needed when considering expatriate groups as representative of country of origin. Our study shows the importance of geographical and social structuring that may affect the utility of forensic databases and the interpretation of Y-STR profiles
Mitigating the effects of reference sequence bias in single-multiplex massively parallel sequencing of the mitochondrial DNA control region.
Sequence analysis of the mitochondrial DNA (mtDNA) control region can provide forensically useful information, particularly in challenging samples where autosomal DNA profiling fails. Sub-division of the 1122-bp region into shorter PCR fragments improves data recovery, and such fragments can be analysed together via massively parallel sequencing (MPS). Here, we generate mtDNA data using the prototype PowerSeq™ Auto/Mito/Y System (Promega) MPS assay, in which a single PCR reaction amplifies ten overlapping amplicons of the control region, in a set of 101 highly diverse samples representing most major clades of the mtDNA phylogeny. The overlapping multiplex design leads to non-uniform coverage in the regions of overlap, where it is further increased by short amplicons generated alongside the intended products. Primer sequences in targeted amplification libraries are a potential source of reference sequence bias and thus should be removed, but the proprietary nature of the primers in commercial kits necessitates an alternative approach that minimises data loss: here, we introduce the bioinformatic selection of sequencing reads spanning putative primer sites (Overarching Read Enrichment Option, OREO). While OREO performs well in mitigating the effects of primer sequences at the ends of sequence reads, we still find evidence of the internalisation of primer-derived sequences by overlap extension, which may compromise the ability to call variants or to measure heteroplasmy in primer-binding regions. The commercially available PowerSeq™ CRM Nested System design prevents primer internalisation, as shown in a reanalysis of a subset of 57 samples that contain possible heteroplasmies. In combination with OREO, the CRM Nested kit mitigates reference sequence bias, allowing heteroplasmic variants to be estimated down to a 5% threshold. Provided appropriate steps are taken in data processing, single-reaction multiplex assays represent robust tools to analyse mtDNA control region variation. The OREO approach will allow users to bypass the effects of unknown primer sequences in any single-reaction tiled multiplex and eliminate primer-derived bias in overlapping amplicon sequencing studies, in both forensic and non-forensic settings