101 research outputs found

    Low Frequency Variants, Collapsed Based on Biological Knowledge, Uncover Complexity of Population Stratification in 1000 Genomes Project Data

    No full text
    <div><p>Analyses investigating low frequency variants have the potential for explaining additional genetic heritability of many complex human traits. However, the natural frequencies of rare variation between human populations strongly confound genetic analyses. We have applied a novel collapsing method to identify biological features with low frequency variant burden differences in thirteen populations sequenced by the 1000 Genomes Project. Our flexible collapsing tool utilizes expert biological knowledge from multiple publicly available database sources to direct feature selection. Variants were collapsed according to genetically driven features, such as evolutionary conserved regions, regulatory regions genes, and pathways. We have conducted an extensive comparison of low frequency variant burden differences (MAF<0.03) between populations from 1000 Genomes Project Phase I data. We found that on average 26.87% of gene bins, 35.47% of intergenic bins, 42.85% of pathway bins, 14.86% of ORegAnno regulatory bins, and 5.97% of evolutionary conserved regions show statistically significant differences in low frequency variant burden across populations from the 1000 Genomes Project. The proportion of bins with significant differences in low frequency burden depends on the ancestral similarity of the two populations compared and types of features tested. Even closely related populations had notable differences in low frequency burden, but fewer differences than populations from different continents. Furthermore, conserved or functionally relevant regions had fewer significant differences in low frequency burden than regions under less evolutionary constraint. This degree of low frequency variant differentiation across diverse populations and feature elements highlights the critical importance of considering population stratification in the new era of DNA sequencing and low frequency variant genomic analyses.</p></div

    Proportion of significantly different bins in natural selection analysis by region of identification: A) AFR continental group, B) EAS continental group, and C) EUR continental group.

    No full text
    <p>The abbreviations for the each population on are on the x and y-axes. The numbers in each block and the color intensity [0,1] indicate the proportion of significant bins (after Bonferroni correction) for the 1000 Genomes populations on each axis, where the darker the color, the higher the proportion of significant bins. In general, the x-axis is organized with African descent populations on the far right and increasing differentiation with regard to low frequency burden towards the left (i.e. populations of Asian descent have the highest proportion of significant bins compared to African descent groups). The regions of natural selection, particularly negative selection, are often accompanied by excess low frequency variants. As world populations evolved, selective forces were often unique and location specific. Therefore, the evolution of low frequency variants compared across world populations can be markers of past selective events. Populations within a continental group are very similar and we see high proportions of statistically significant bins between populations of different continental groups.</p

    Proportion of significantly different bins in A) ORegAnno regulatory and B) pathway feature analysis.

    No full text
    <p>The abbreviations for the each population on are on the x and y-axes. The numbers in each block and the color intensity [0,1] indicate the proportion of significant bins (after Bonferroni correction) for the 1000 Genomes populations on each axis, where the darker the color, the higher the proportion of significant bins. In general, the x-axis is organized with African descent populations on the far right and increasing differentiation with regard to low frequency burden towards the left (i.e. populations of Asian descent have the highest proportion of significant bins compared to African descent groups). From more conserved regulatory regions to relatively large binned pathways, <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003959#pgen-1003959-g005" target="_blank">Figure 5A</a> shows conservation in comparison to genic regions (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003959#pgen-1003959-g003" target="_blank">Figure 3</a>) and <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003959#pgen-1003959-g005" target="_blank">Figure 5B</a> shows occasionally very high proportions of significant bins in parent pathway bins in comparison to genic regions (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003959#pgen-1003959-g003" target="_blank">Figure 3</a>).</p

    Proportion of significantly different bins for gene-exon filters: A) nonsynonymous and B) predicted deleterious variants.

    No full text
    <p>The abbreviations for the each population on are on the x and y-axes. The numbers in each block and the color intensity [0,1] indicate the proportion of significant bins (after Bonferroni correction) for the 1000 Genomes populations on each axis, where the darker the color, the higher the proportion of significant bins. In general, the x-axis is organized with African descent populations on the far right and increasing differentiation with regard to low frequency burden towards the left (i.e. populations of Asian descent have the highest proportion of significant bins compared to African descent groups). Filtering gene exon regions by mutation type and predicted functional significance lead to smaller bins and overall greatly reduced proportions of significance.</p

    Proportion of significantly different bins in A) gene exon, B) gene intron, and C) intergenic regions.

    No full text
    <p>The abbreviations for the each population on found on the x and y-axes. The numbers in each block and the color intensity [0,1] indicate the proportion of significant bins (after Bonferroni correction) for the 1000 Genomes populations on each axis, where the darker the color, the higher the proportion of significant bins. In general, the x-axis is organized with African descent populations on the far right and increasing differentiation with regard to low frequency burden towards the left (i.e. populations of Asian descent have the highest proportion of significant bins compared to African descent groups). The proportion of significant bins across all population comparisons increases from coding (A) to noncoding (B) and finally intergenic (C) regions.</p

    Alternate binning strategies using biological knowledge and functional or role annotations.

    No full text
    <p>Three example binning strategies: gene burden analysis, pathway burden analysis, and functional pathway burden analysis using four genes, two pathways, and variant functional prediction information.</p

    Binning analysis overview.

    No full text
    <p>Analyses performed for each population comparison; including, features tested, contributing sources, and total of bins generated for each binning analysis.</p

    Proportion of loci in top bins in high LD with other variants in the same bin in A) CEU-CHB, B) CHB-YRI, and C) CEU-YRI population gene feature comparisons.

    No full text
    <p>Each bar represents a gene or intergenic bin. For a particular population comparison (A–C), the total height of the bar corresponds to the number of loci in that bin. The shades of blue and purple correspond to loci with r<sup>2</sup> LD values greater than 0.3 for a specific population shown in the legend. The variant can be in LD in one population, the other population, or both (described in each legend). Almost all of the low frequency loci in LD had r<sup>2</sup> values of approximately 0.5 or 1, corresponding to almost perfect LD. The white space corresponds to loci in the bin with LD values less than 0.3. The top bins are therefore, mostly composed of independent loci.</p

    Phase I 1000 Genomes Project data characteristics.

    No full text
    <p>Fourteen populations released in the Phase I 1000 Genomes Project data release, including the continental group, population abbreviation (POP), short description of each population (POPULATION), number of individuals (N), number of cryptically related individuals dropped in final analyses (REL), total number of loci, variants, low frequency variants (MAF< = 0.03), and private variants. Only autosomal variants were considered. The total loci column refers to the number of variant lines in the VCF file, but not all of these lines contain binnable variants, due to filtering and missing data.</p

    Cytoscape Network Showing the Connections between Phenotypes, the Genes with SNPs, and Pathways.

    No full text
    <p>In this network, green squares represent phenotype; red triangles represent genes; and blue circles are KEGG pathways. The colored lines highlight the link between phenotype and pathway. For the gene <i>HLA-DRA</i> with SNPs associated with “<i>714</i>: <i>rheumatoid arthritis</i>” and “<i>250</i>: <i>type 1 diabetes</i>” is present in the KEGG pathway of “<i>rheumatoid arthritis</i>” (red line) and “<i>type 1 diabetes”</i> (green line) respectively. Also, the blue edge shows the connection between <i>“714</i>: <i>rheumatoid arthritis”</i>, <i>“716</i>: <i>other specified arthropathies”</i> and the KEGG “<i>JAK-STAT signaling pathway</i>” through two interleukin genes, <i>IL23R</i> and <i>IL6</i>.</p
    • …
    corecore