10 research outputs found

    Microsatellite Tandem Repeats Are Abundant in Human Promoters and Are Associated with Regulatory Elements

    Get PDF
    <div><p>Tandem repeats are genomic elements that are prone to changes in repeat number and are thus often polymorphic. These sequences are found at a high density at the start of human genes, in the gene’s promoter. Increasing empirical evidence suggests that length variation in these tandem repeats can affect gene regulation. One class of tandem repeats, known as microsatellites, rapidly alter in repeat number. Some of the genetic variation induced by microsatellites is known to result in phenotypic variation. Recently, our group developed a novel method for measuring the evolutionary conservation of microsatellites, and with it we discovered that human microsatellites near transcription start sites are often highly conserved. In this study, we examined the properties of microsatellites found in promoters. We found a high density of microsatellites at the start of genes. We showed that microsatellites are statistically associated with promoters using a wavelet analysis, which allowed us to test for associations on multiple scales and to control for other promoter related elements. Because promoter microsatellites tend to be G/C rich, we hypothesized that G/C rich regulatory elements may drive the association between microsatellites and promoters. Our results indicate that CpG islands, G-quadruplexes (G4) and untranslated regulatory regions have highly significant associations with microsatellites, but controlling for these elements in the analysis does not remove the association between microsatellites and promoters. Due to their intrinsic lability and their overlap with predicted functional elements, these results suggest that many promoter microsatellites have the potential to affect human phenotypes by generating mutations in regulatory elements, which may ultimately result in disease. We discuss the potential functions of human promoter microsatellites in this context.</p> </div

    Most significant motifs associated with distance to the TSS from the linear analysis.

    No full text
    <p>The top 10 most significant motifs associated with distance to TSS (in base-pairs), for the upstream and downstream regions, analyzed separately. These factors are sorted by their false discovery rate q-value (Sorted q-values). The size of the regression coefficient (Reg. coef.) indicates the strength of the association, with large positive coefficients belonging to motifs frequently found near the TSS. The full list of significant factors can be found in.</p><p><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0054710#pone.0054710.s001" target="_blank">Tables S1</a> and <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0054710#pone.0054710.s002" target="_blank">S2</a>.</p

    Strand-specific densities for the motifs A/T and AC/GT around promoters.

    No full text
    <p>These figures show the cubic spline of the densities of each strand-specific motif for bins of size 1kb (solid) and 100 base-pair (dashed) for the entire 5 kb promoter region.</p

    Distribution of microsatellites around promoters.

    No full text
    <p>The total number of microsatellites present in each 100 base-pair bin are provided for all microsatellites within 10 kb of the TSS. Also shown are the total number of only coding microsatellites (blue) or only 5′ UTR microsatellites (red).</p

    GO Results for genes with microsatellites that overlap with G4 elements.

    No full text
    <p>Gene ontology (GO) results for genes that contain microsatellites that overlap with G4 elements in their promoter. Hyper FDR Q-value is the false discovery rate q-value, Hyper fold enrichment is the enrichment of the test set on the overall (control) set for each category. 2,666 genes contain a G4 that overlaps with a microsatellite. For a control set we used genes that contain G4 elements in their promoters, for a total of 14,977 genes. The promoter region here was again 5 kb upstream and down of the TSS.</p

    Linear model of wavelet results, displaying p-values.

    No full text
    <p>The top figure shows the results of the smooth coefficients, the bottom shows the results of the detail coefficients. Positive relationships are shown in red, negative in blue. The value is shown at the bottom of the figure. The largest scales were not included in this figure for simplicity.</p

    Most common motifs found within 5 kb of the TSS and their strand-specific motif results.

    No full text
    <p>The most common motifs and their strand-specific counts are displayed. The binomial test (Binom.) p-value is the chance that these strand-specific frequencies deviate from an expected value of 50%. The Kolmogorov-Smirnov (KS) test values provide a measurement of the difference between the distribution of the two different strand-specific motifs, for each motif pair. The p-values shown are not corrected for multiple tests.</p

    Kendall rank correlations between wavelet coefficients.

    No full text
    <p>The pairwise correlations between smooth coefficients are in the top right, and detail coefficients are the bottom left. The diagonal displays the normalized power spectrum for the wavelet coefficients, which can be interpreted as a measure of the variation of each signal at each scale. Note that the majority of factors examined here have most of their variation at the finest scales, while GC content and G4 elements contain a large amount of variation at the largest scales. Abbreviations for each element are “msat” for microsatellite, “G4” for predicted G4 regions, “CpG” for CpG islands, and “GC” for G/C content. Associations with a p-value above 0.001 are shown in red if positive, blue if negative. The smallest scale examined was 1 kb in size, and each successive scale increases by a factor of two.</p

    Frequencies of motifs for all simple microsatellites in the human genome.

    No full text
    <p>The most common motifs in the human genome are shown, along with their counts and frequencies relative to all other microsatellites. A few motifs commonly found in promoters are also shown. The total number of microsatellites examined here is 538,964.</p

    Motifs of microsatellites that overlap with G4.

    No full text
    <p>Of the 13,838 microsatellites that overlap with a G4 element, the most common motifs are shown. For each microsatellite motif, the average base-pair overlap with G4 is shown (Avg. overlap (bp)). The average fraction of each microsatellite that overlaps with the G4 element is also shown (Avg. Overlap fraction). Note that motifs that are dissimilar to the canonical G4 definition, such as AC, usually share only a portion of the microsatellite in the G4 element.</p
    corecore