5 research outputs found

    Prevalent and rare variants show distinctive conservation preference.

    No full text
    <p>(<b>A</b>) Distribution of A,T and G,C nucleotides in the UCEs and SNV positions. <b>(B–D</b>) Cumulative distribution plots of phyloP scores of SNVs with different MAFs. Data from three different data sources (<b>B</b>) SG-CHN, (<b>C</b>) ITA and (<b>D</b>) 1 KG are shown. Shaded grey area represents 95% confidence interval (obtained by bootstrapping) of random G/C content corrected UCE positions (blue line). Numbers in the parentheses indicate analyzed positions or SNVs.</p

    General characterization of SNVs in the UCEs.

    No full text
    <p>(<b>A</b>) Number of SNVs per mega base (Mb) of UCE sequence per sample. SNVs from three data sources- Singaporean Chinese cohort (SG-CHN), Italian cohort (ITA) and 1000 Genome Project (1 KG) were used. SNVs are discriminated according to their minor allele frequency (MAF). Numbers in the parentheses represent sample size used in this study (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0110692#s2" target="_blank">Materials and Methods</a>). Random set represents random genomic regions that have the same total length as the UCEs set. Y-axis represents SNVs per Mb divided by sample count in the analyzed population. (<b>B–D</b>) Shared and distinct SNVs between SG-CHN, ITA and 1 KG populations. Venn diagrams of (<b>B</b>) all, (<b>C</b>) prevalent (MAF>0.5%) and rare (<b>D</b>) (MAF<0.5%) SNVs from three analyzed population. Numbers in the parentheses indicate analyzed SNVs in the corresponding population.</p

    UCEs are enriched for the TFBS.

    No full text
    <p>(<b>A</b>) Box plots represent results of one hundred sets (each set contains one thousand randomly chosen positions). The y-axis indicates actual ENCODE TFBS overlap per one thousand tested positions. Boxes show IQR, notches indicate 95% confidence intervals of the median, whiskers extend to 1.5 times the IQR and open circles show outliers. *** P<2.2×10<sup>−16</sup>, two- tailed Mann–Whitney test. (<b>B</b>) Prevalent SNV positions are depleted for TFBS. All rare and prevalent SNV positions from the three different populations were analyzed for the ENCODE TFBS overlap. Random UCE set represents randomly chosen UCE positions (G,C content matched) that had the same number of analyzed positions as the rare and prevalent SNVs. Prevalent and rare SNVs overlap with the TFBS overlap is shown as relative to random UCE positions. For the statistical analysis each set (Pearson's Chi-squared test) was individually tested. * P<0.01.</p

    UCEs comparison to the less constraint SE.

    No full text
    <p>(<b>A</b>) SE are less constraint compared to UCEs. Cumulative distribution plots of phyloP scores of all SE positions (purple line), all UCE positions (red line), random genomic positions (orange line) and SE rare (MAF<0.5%, green line) and prevalent (MAF>5%, black line) SNVs. Prevalent and rare SNVs are extracted from the 1 KG project using global MAFs. Shaded grey area represents 95% confidence interval (obtained by bootstrapping) of random UCE positions (blue line). The numbers of analyzed SNVs are given in the parentheses. (<b>B</b>) SE have a higher overlapping TFBSs count compared to UCEs. Box plots represent results of one hundred sets (each set contains one thousand randomly chosen positions). The y-axis indicates actual ENCODE TFBS overlap per one thousand tested positions. Boxes show IQR, notches indicate 95% confidence intervals of the median, whiskers extend to 1.5 times the IQR and open circles show outliers. *** P <2.2 ×10<sup>−16</sup>, two- tailed Mann–Whitney test. (<b>C</b>) Venn diagram showing overlap of ENCODE TF and UCE bound TF described by Viturawong et. al <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0110692#pone.0110692-Viturawong1" target="_blank">[12]</a>. (<b>D</b>) Comparison of ENCODE TF and UCE bound TF <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0110692#pone.0110692-Viturawong1" target="_blank">[12]</a> protein domains identifies RNA recognition domain,RRM1 (marked with dashed circle), as the most prevalent domain among UCE bound proteins. Protein domain (Pfam) annotations were done by using the Perseus module in the MaxQuant software suite.</p
    corecore