72 research outputs found

    PCAdmix: Principal Components-Based Assignment of Ancestry along Each Chromosome in Individuals with Admixed Ancestry from Two or More Populations

    Get PDF
    Identifying ancestry along each chromosome in admixed individuals provides a wealth of information for understanding the population genetic history of admixture events and is valuable for admixture mapping and identifying recent targets of selection. We present PCAdmix (available at https://sites.google.com/site/pcadmix/home), a Principal Componentsbased algorithm for determining ancestry along each chromosome from a high-density, genome-wide set of phased single-nucleotide polymorphism (SNP) genotypes of admixed individuals. We compare our method to HAPMIX on simulated data from two ancestral populations, and we find high concordance between the methods. Our method also has better accuracy than LAMP when applied to three-population admixture, a situation as yet unaddressed by HAPMIX. Finally, we apply our method to a data set of four Latino populations with European, African, and Native American ancestry. We find evidence of assortative mating in each of the four populations, and we identify regions of shared ancestry that may be recent targets of selection and could serve as candidate regions for admixture-based association mapping

    Nucleosome DNA sequence structure of isochores

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Significant differences in G+C content between different isochore types suggest that the nucleosome positioning patterns in DNA of the isochores should be different as well.</p> <p>Results</p> <p>Extraction of the patterns from the isochore DNA sequences by Shannon N-gram extension reveals that while the general motif YRRRRRYYYYYR is characteristic for all isochore types, the dominant positioning patterns of the isochores vary between TAAAAATTTTTA and CGGGGGCCCCCG due to the large differences in G+C composition. This is observed in human, mouse and chicken isochores, demonstrating that the variations of the positioning patterns are largely G+C dependent rather than species-specific. The species-specificity of nucleosome positioning patterns is revealed by dinucleotide periodicity analyses in isochore sequences. While human sequences are showing CG periodicity, chicken isochores display AG (CT) periodicity. Mouse isochores show very weak CG periodicity only.</p> <p>Conclusions</p> <p>Nucleosome positioning pattern as revealed by Shannon N-gram extension is strongly dependent on G+C content and different in different isochores. Species-specificity of the pattern is subtle. It is reflected in the choice of preferentially periodical dinucleotides.</p

    Reconstructing native American migrations from whole-genome and whole-exome data.

    Get PDF
    There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR is most closely related to populations surrounding the Orinoco River basin, confirming the Southern American ancestry of the Taíno people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a demographic model for three ancestral Native American populations. These ancestral populations likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya), supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features effective populations of 62,000 in Mexico, 8,700 in Colombia, and 1,900 in Puerto Rico. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest effective size and the earlier migration from Europe. Finally, we compare IBD and ancestry assignments to find evidence for relatedness among European founders to the three populations

    Reconstructing Native American Migrations from Whole-Genome and Whole-Exome Data

    Get PDF
    There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR is most closely related to populations surrounding the Orinoco River basin, confirming the Southern America ancestry of the Taíno people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a demographic model for three ancestral Native American populations. These ancestral populations likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya), supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features effective populations of 62,000 in Mexico, 8,700 in Colombia, and 1,900 in Puerto Rico. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest effective size and the earlier migration from Europe. Finally, we compare IBD and ancestry assignments to find evidence for relatedness among European founders to the three populations.Facultad de Ciencias Naturales y MuseoInstituto Multidisciplinario de Biología Celula

    Genomic Ancestry of North Africans Supports Back-to-Africa Migrations

    Get PDF
    North African populations are distinct from sub-Saharan Africans based on cultural, linguistic, and phenotypic attributes; however, the time and the extent of genetic divergence between populations north and south of the Sahara remain poorly understood. Here, we interrogate the multilayered history of North Africa by characterizing the effect of hypothesized migrations from the Near East, Europe, and sub-Saharan Africa on current genetic diversity. We present dense, genome-wide SNP genotyping array data (730,000 sites) from seven North African populations, spanning from Egypt to Morocco, and one Spanish population. We identify a gradient of likely autochthonous Maghrebi ancestry that increases from east to west across northern Africa; this ancestry is likely derived from “back-to-Africa” gene flow more than 12,000 years ago (ya), prior to the Holocene. The indigenous North African ancestry is more frequent in populations with historical Berber ethnicity. In most North African populations we also see substantial shared ancestry with the Near East, and to a lesser extent sub-Saharan Africa and Europe. To estimate the time of migration from sub-Saharan populations into North Africa, we implement a maximum likelihood dating method based on the distribution of migrant tracts. In order to first identify migrant tracts, we assign local ancestry to haplotypes using a novel, principal component-based analysis of three ancestral populations. We estimate that a migration of western African origin into Morocco began about 40 generations ago (approximately 1,200 ya); a migration of individuals with Nilotic ancestry into Egypt occurred about 25 generations ago (approximately 750 ya). Our genomic data reveal an extraordinarily complex history of migrations, involving at least five ancestral populations, into North Africa

    Differential endothelial cell gene expression by African Americans versus Caucasian Americans: a possible contribution to health disparity in vascular disease and cancer

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Health disparities and the high prevalence of cardiovascular disease continue to be perplexing worldwide health challenges. This study addresses the possibility that genetic differences affecting the biology of the vascular endothelium could be a factor contributing to the increased burden of cardiovascular disease and cancer among African Americans (AA) compared to Caucasian Americans (CA).</p> <p>Methods</p> <p>From self-identified, healthy, 20 to 29-year-old AA (n = 21) and CA (n = 17), we established cultures of blood outgrowth endothelial cells (BOEC) and applied microarray profiling. BOEC have never been exposed to <it>in vivo </it>influences, and their gene expression reflects culture conditions (meticulously controlled) and donor genetics. Significance Analysis of Microarray identified differential expression of single genes. Gene Set Enrichment Analysis examined expression of pre-determined gene sets that survey nine biological systems relevant to endothelial biology.</p> <p>Results</p> <p>At the highly stringent threshold of False Discovery Rate (FDR) = 0, 31 single genes were differentially expressed in AA. <it>PSPH </it>exhibited the greatest fold-change (AA > CA), but this was entirely accounted for by a homolog (<it>PSPHL</it>) hidden within the <it>PSPH </it>probe set. Among other significantly different genes were: for AA > CA, <it>SOS1, AMFR, FGFR3; and for AA < CA, ARVCF, BIN3, EIF4B. </it>Many more (221 transcripts for 204 genes) were differentially expressed at the less stringent threshold of FDR <.05. Using the biological systems approach, we identified shear response biology as being significantly different for AA versus CA, showing an apparent tonic increase of expression (AA > CA) for 46/157 genes within that system.</p> <p>Conclusions</p> <p>Many of the genes implicated here have substantial roles in endothelial biology. Shear stress response, a critical regulator of endothelial function and vascular homeostasis, may be different between AA and CA. These results potentially have direct implications for the role of endothelial cells in vascular disease (hypertension, stroke) and cancer (via angiogenesis). Also, they are consistent with our over-arching hypothesis that genetic influences stemming from ancestral continent-of-origin could impact upon endothelial cell biology and thereby contribute to disparity of vascular-related disease burden among AA. The method used here could be productively employed to bridge the gap between information from structural genomics (for example, disease association) and cell function and pathophysiology.</p

    Reconstructing Native American Migrations from Whole-Genome and Whole-Exome Data

    Get PDF
    There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR is most closely related to populations surrounding the Orinoco River basin, confirming the Southern America ancestry of the Taíno people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a demographic model for three ancestral Native American populations. These ancestral populations likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya), supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features effective populations of 62,000 in Mexico, 8,700 in Colombia, and 1,900 in Puerto Rico. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest effective size and the earlier migration from Europe. Finally, we compare IBD and ancestry assignments to find evidence for relatedness among European founders to the three populations.Facultad de Ciencias Naturales y MuseoInstituto Multidisciplinario de Biología Celula

    Dissecting the Within-Africa Ancestry of Populations of African Descent in the Americas

    Get PDF
    The ancestry of African-descended Americans is known to be drawn from three distinct populations: African, European, and Native American. While many studies consider this continental admixture, few account for the genetically distinct sources of ancestry within Africa--the continent with the highest genetic variation. Here, we dissect the within-Africa genetic ancestry of various populations of the Americas self-identified as having primarily African ancestry using uniparentally inherited mitochondrial DNA.We first confirmed that our results obtained using uniparentally-derived group admixture estimates are correlated with the average autosomal-derived individual admixture estimates (hence are relevant to genomic ancestry) by assessing continental admixture using both types of markers (mtDNA and Y-chromosome vs. ancestry informative markers). We then focused on the within-Africa maternal ancestry, mining our comprehensive database of published mtDNA variation (∼5800 individuals from 143 African populations) that helped us thoroughly dissect the African mtDNA pool. Using this well-defined African mtDNA variation, we quantified the relative contributions of maternal genetic ancestry from multiple W/WC/SW/SE (West to South East) African populations to the different pools of today's African-descended Americans of North and South America and the Caribbean.Our analysis revealed that both continental admixture and within-Africa admixture may be critical to achieving an adequate understanding of the ancestry of African-descended Americans. While continental ancestry reflects gender-specific admixture processes influenced by different socio-historical practices in the Americas, the within-Africa maternal ancestry reflects the diverse colonial histories of the slave trade. We have confirmed that there is a genetic thread connecting Africa and the Americas, where each colonial system supplied their colonies in the Americas with slaves from African colonies they controlled or that were available for them at the time. This historical connection is reflected in different relative contributions from populations of W/WC/SW/SE Africa to geographically distinct Africa-derived populations of the Americas, adding to the complexity of genomic ancestry in groups ostensibly united by the same demographic label

    Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel

    Get PDF
    A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants. © 2014 Macmillan Publishers Limited. All rights reserved
    corecore