72 research outputs found
Lessons from the Whole Exome Sequencing Effort in Populations of Russia and Tajikistan
© 2016, Springer Science+Business Media New York.In contrast with the traditional methods applied to assessment of population diversity, high-throughput sequencing technologies have a wider application in clinical practice with greater potential to find novel disease-causing variants for multifactorial disorders. Widely used test panels may not meet their goal to diagnose the patient’s condition with a full reliability since this method often does not take into account the population frequencies of analyzed genetic markers. Here, we analyzed 57 male individuals of five ethnic groups from Russia and Tajikistan using the whole exome sequencing technique (Ion AmpliSeq Exome), which resulted in detecting more than 299,000 single nucleotide polymorphisms. Samples formed clusters on the PCA plot according to the geographical location of the corresponding populations. Thereby, the methodology of whole-exome sequencing, in general, and the Ion AmpliSeq Exome panel, in particular, could be positively applied for the purposes of population genetics and for detection of the novel clinically relevant variants
From cheek swabs to consensus sequences : an A to Z protocol for high-throughput DNA sequencing of complete human mitochondrial genomes
Background: Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users.
Results: Here we present an ‘A to Z’ protocol for obtaining complete human mitochondrial (mtDNA) genomes – from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling).
Conclusions: All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual ‘modules’ can be swapped out to suit available resources
The Genographic Project Public Participation Mitochondrial DNA Database
The Genographic Project is studying the genetic signatures of ancient human migrations and creating an open-source research database. It allows members of the public to participate in a real-time anthropological genetics study by submitting personal samples for analysis and donating the genetic results to the database. We report our experience from the first 18 months of public participation in the Genographic Project, during which we have created the largest standardized human mitochondrial DNA (mtDNA) database ever collected, comprising 78,590 genotypes. Here, we detail our genotyping and quality assurance protocols including direct sequencing of the mtDNA HVS-I, genotyping of 22 coding-region SNPs, and a series of computational quality checks based on phylogenetic principles. This database is very informative with respect to mtDNA phylogeny and mutational dynamics, and its size allows us to develop a nearest neighbor–based methodology for mtDNA haplogroup prediction based on HVS-I motifs that is superior to classic rule-based approaches. We make available to the scientific community and general public two new resources: a periodically updated database comprising all data donated by participants, and the nearest neighbor haplogroup prediction tool
Reconstructing the genetic structure of the Kazakh from clan distribution data
Applying quasigenetic markers - non-biological traits which are nevertheless inherited in generations - is one of the research fields within human population genetics. For the West European, East European, and Caucasus populations, surnames are typical quasigenetic markers. For Central Asian populations, particularly Kazakh, the clan affiliation serves as a good marker: a set of papers demonstrated that many clans include mainly persons which biologically descent from a recent common ancestor. In this study, we analyzed a large (~4.2 million persons) dataset on quasigenetic markers - the geographic distribution of 50 Kazakh clans at the beginning of the 20th century, and compared the dataset with the direct data of the Y-chro-mosomal diversity in modern Kazakh populations. The analysis included three steps: the isonymy method, which is standard for quasigenetic markers, comparing frequencies of quasigenetic markers, and comparing the quasigenetic and genetic datasets. We constructed 50 maps of frequency of the distribution of each clan and revealed that these maps correlate with the maps of genetic distances. The Mantel test also demonstrated a significant correlation between geographic and quasigenetic distances (г = 0.60; p < 0.05). The analysis of inter-population variability revealed the largest diversity between geographic territories corresponding to the social-territorial groups of the Kazakh Khanate (zhuzes) rather than to other historical groups that existed on the territory of Kazakhstan in preceding and modern epochs. The same is evidenced by the principal components and multidimensional scaling plots, which grouped geographic populations into three clusters corresponding to three zhuzes. This indicates that the final structuring of the Kazakh gene pool might have occurred during the Kazakh Khanate period
Y-Chromosomal Diversity in Lebanon Is Structured by Recent Historical Events
Lebanon is an eastern Mediterranean country inhabited by approximately four million people with a wide variety of ethnicities and religions, including Muslim, Christian, and Druze. In the present study, 926 Lebanese men were typed with Y-chromosomal SNP and STR markers, and unusually, male genetic variation within Lebanon was found to be more strongly structured by religious affiliation than by geography. We therefore tested the hypothesis that migrations within historical times could have contributed to this situation. Y-haplogroup J∗(xJ2) was more frequent in the putative Muslim source region (the Arabian Peninsula) than in Lebanon, and it was also more frequent in Lebanese Muslims than in Lebanese non-Muslims. Conversely, haplogroup R1b was more frequent in the putative Christian source region (western Europe) than in Lebanon and was also more frequent in Lebanese Christians than in Lebanese non-Christians. The most common R1b STR-haplotype in Lebanese Christians was otherwise highly specific for western Europe and was unlikely to have reached its current frequency in Lebanese Christians without admixture. We therefore suggest that the Islamic expansion from the Arabian Peninsula beginning in the seventh century CE introduced lineages typical of this area into those who subsequently became Lebanese Muslims, whereas the Crusader activity in the 11th–13th centuries CE introduced western European lineages into Lebanese Christians
The genetic history of admixture across inner Eurasia
This is the author accepted manuscript. The final version is available from Nature Research via the DOI in this record.Data Availability. Genome-wide sequence data of two Botai individuals (BAM format) are available at the European Nucleotide Archive under the accession number PRJEB31152 (ERP113669). Eigenstrat format array genotype data of 763 present-day individuals and 1240K pulldown genotype data of two ancient Botai individuals are available at the Edmond data repository of the Max Planck Society
(https://edmond.mpdl.mpg.de/imeji/collection/Aoh9c69DscnxSNjm?q=).The indigenous populations of inner Eurasia, a huge geographic region covering the central Eurasian steppe and the northern Eurasian taiga and tundra, harbor tremendous diversity in their genes, cultures and languages. In this study, we report novel genome-wide data for 763 individuals from Armenia, Georgia, Kazakhstan, Moldova, Mongolia, Russia, Tajikistan, Ukraine, and Uzbekistan. We furthermore report additional damage-reduced genome-wide data of two previously published individuals from the Eneolithic Botai culture in Kazakhstan (~5,400 BP). We find that present-day inner Eurasian populations are structured into three distinct admixture clines stretching between various western and eastern Eurasian ancestries, mirroring geography. The Botai and more recent ancient genomes from Siberia show a decrease in contribution from so-called “ancient North Eurasian” ancestry over time, detectable only in the northern-most “forest-tundra” cline. The intermediate “steppe-forest” cline descends from the Late Bronze Age steppe ancestries, while the “southern steppe” cline further to the South shows a strong West/South Asian influence. Ancient genomes suggest a northward spread of the southern steppe cline in Central Asia during the first millennium BC. Finally, the genetic structure of Caucasus populations highlights a role of the Caucasus Mountains as a barrier to gene flow and suggests a post-Neolithic gene flow into North
Caucasus populations from the steppe.Max Planck SocietyEuropean Research Council (ERC)Russian Foundation for Basic Research (RFBR)Russian Scientific FundNational Science FoundationU.S. National Institutes of HealthAllen Discovery CenterUniversity of OstravaCzech Ministry of EducationXiamen UniversityFundamental Research Funds for the Central UniversitiesMES R
Neolithic Mitochondrial Haplogroup H Genomes and the Genetic Origins of Europeans
Haplogroup H dominates present-day Western European mitochondrial DNA variability (\u3e40%), yet was less common (~19%) among Early Neolithic farmers (~5450 BC) and virtually absent in Mesolithic hunter-gatherers. Here we investigate this major component of the maternal population history of modern Europeans and sequence 39 complete haplogroup H mitochondrial genomes from ancient human remains. We then compare this ‘real-time’ genetic data with cultural changes taking place between the Early Neolithic (~5450 BC) and Bronze Age (~2200 BC) in Central Europe. Our results reveal that the current diversity and distribution of haplogroup H were largely established by the Mid Neolithic (~4000 BC), but with substantial genetic contributions from subsequent pan-European cultures such as the Bell Beakers expanding out of Iberia in the Late Neolithic (~2800 BC). Dated haplogroup H genomes allow us to reconstruct the recent evolutionary history of haplogroup H and reveal a mutation rate 45% higher than current estimates for human mitochondria
The GenoChip: A New Tool for Genetic Anthropology
The Genographic Project is an international effort aimed at charting human migratory history. The project is nonprofit and nonmedical,
and, through its Legacy Fund, supports locally led efforts to preserve indigenous and traditional cultures. Although the first
phase of the project was focused on uniparentally inherited markers on the Y-chromosome and mitochondrial DNA (mtDNA), the
current phase focuses on markers from across the entire genome to obtain a more complete understanding of human genetic
variation. Although many commercial arrays exist for genome-wide single-nucleotide polymorphism (SNP) genotyping, they were
designed for medical genetic studies and contain medically related markers that are inappropriate for global population genetic
studies. GenoChip, the Genographic Project’s new genotyping array, was designed to resolve these issues and enable higher resolution
research into outstanding questions in genetic anthropology. TheGenoChip includes ancestry informativemarkers obtained
for over 450 human populations, an ancient human (Saqqaq), and two archaic hominins (Neanderthal and Denisovan) and was
designed to identify all knownY-chromosome andmtDNAhaplogroups. The chip was carefully vetted to avoid inclusion ofmedically
relevant markers. To demonstrate its capabilities, we compared the FST distributions of GenoChip SNPs to those of two commercial
arrays. Although all arrays yielded similarly shaped (inverse J) FST distributions, the GenoChip autosomal and X-chromosomal distributions
had the highestmean FST, attesting to its ability to discern subpopulations. The chip performances are illustrated in a principal
component analysis for 14 worldwide populations. In summary, the GenoChip is a dedicated genotyping platform for genetic
anthropology. With an unprecedented number of approximately 12,000 Y-chromosomal and approximately 3,300 mtDNA SNPs
and over 130,000 autosomal and X-chromosomal SNPswithout any known health,medical, or phenotypic relevance, the GenoChip
is a useful tool for genetic anthropology and population genetics
Geographic population structure analysis of worldwide human populations infers their biogeographical origins
The search for a method that utilizes biological information to predict humans’ place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data in an effort to achieve this goal but with limited success. While biogeographical algorithms using next-generation sequencing data have achieved an accuracy of 700 km in Europe, they were inaccurate elsewhere. Here we describe the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three data sets using 40,000–130,000 SNPs. GPS placed 83% of worldwide individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS placed a quarter of them in their villages and most of the rest within 50 km of their villages. GPS’s accuracy and power to infer the biogeography of worldwide individuals down to their country or, in some cases, village, of origin, underscores the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing
Population distribution and ancestry of the cancer protective MDM2 SNP285 (rs117039649)
The MDM2 promoter SNP285C is located on the SNP309G allele. While SNP309G enhances Sp1 transcription factor binding and MDM2 transcription, SNP285C antagonizes Sp1 binding and reduces the risk of breast-, ovary- and endometrial cancer. Assessing SNP285 and 309 genotypes across 25 different ethnic populations (>10.000 individuals), the incidence of SNP285C was 6-8% across European populations except for Finns (1.2%) and Saami (0.3%). The incidence decreased towards the Middle-East and Eastern Russia, and SNP285C was absent among Han Chinese, Mongolians and African Americans. Interhaplotype variation analyses estimated SNP285C to have originated about 14,700 years ago (95% CI: 8,300 - 33,300). Both this estimate and the geographical distribution suggest SNP285C to have arisen after the separation between Caucasians and modern day East Asians (17,000 - 40,000 years ago). We observed a strong inverse correlation (r = -0.805; p < 0.001) between the percentage of SNP309G alleles harboring SNP285C and the MAF for SNP309G itself across different populations suggesting selection and environmental adaptation with respect to MDM2 expression in recent human evolution. In conclusion, we found SNP285C to be a pan-Caucasian variant. Ethnic variation regarding distribution of SNP285C needs to be taken into account when assessing the impact of MDM2 SNPs on cancer risk
- …