24 research outputs found

    PhyloScan: identification of transcription factor binding sites using cross-species evidence

    Get PDF
    BACKGROUND: When transcription factor binding sites are known for a particular transcription factor, it is possible to construct a motif model that can be used to scan sequences for additional sites. However, few statistically significant sites are revealed when a transcription factor binding site motif model is used to scan a genome-scale database. METHODS: We have developed a scanning algorithm, PhyloScan, which combines evidence from matching sites found in orthologous data from several related species with evidence from multiple sites within an intergenic region, to better detect regulons. The orthologous sequence data may be multiply aligned, unaligned, or a combination of aligned and unaligned. In aligned data, PhyloScan statistically accounts for the phylogenetic dependence of the species contributing data to the alignment and, in unaligned data, the evidence for sites is combined assuming phylogenetic independence of the species. The statistical significance of the gene predictions is calculated directly, without employing training sets. RESULTS: In a test of our methodology on synthetic data modeled on seven Enterobacteriales, four Vibrionales, and three Pasteurellales species, PhyloScan produces better sensitivity and specificity than MONKEY, an advanced scanning approach that also searches a genome for transcription factor binding sites using phylogenetic information. The application of the algorithm to real sequence data from seven Enterobacteriales species identifies novel Crp and PurR transcription factor binding sites, thus providing several new potential sites for these transcription factors. These sites enable targeted experimental validation and thus further delineation of the Crp and PurR regulons in E. coli. CONCLUSION: Better sensitivity and specificity can be achieved through a combination of (1) using mixed alignable and non-alignable sequence data and (2) combining evidence from multiple sites within an intergenic region

    A structural interpretation of the effect of GC-content on efficiency of RNA interference

    Get PDF
    BACKGROUND: RNA interference (RNAi) mediated by small interfering RNAs (siRNAs) or short hairpin RNAs (shRNAs) has become a powerful technique for eukaryotic gene knockdown. siRNA GC-content negatively correlates with RNAi efficiency, and it is of interest to have a convincing mechanistic interpretation of this observation. We here examine this issue by considering the secondary structures for both the target messenger RNA (mRNA) and the siRNA guide strand. RESULTS: By analyzing a unique homogeneous data set of 101 shRNAs targeted to 100 endogenous human genes, we find that: 1) target site accessibility is more important than GC-content for efficient RNAi; 2) there is an appreciable negative correlation between GC-content and RNAi activity; 3) for the predicted structure of the siRNA guide strand, there is a lack of correlation between RNAi activity and either the stability or the number of free dangling nucleotides at an end of the structure; 4) there is a high correlation between target site accessibility and GC-content. For a set of representative structural RNAs, the GC content of 62.6% for paired bases is significantly higher than the GC content of 38.7% for unpaired bases. Thus, for a structured RNA, a region with higher GC content is likely to have more stable secondary structure. Furthermore, by partial correlation analysis, the correlation for GC-content is almost completely diminished, when the effect of target accessibility is controlled. CONCLUSION: These findings provide a target-structure-based interpretation and mechanistic insight for the effect of GC-content on RNAi efficiency

    A Structural Interpretation of the Effect of GC-Content on Efficiency of RNA Interference

    Get PDF
    BACKGROUND: RNA interference (RNAi) mediated by small interfering RNAs (siRNAs) or short hairpin RNAs (shRNAs) has become a powerful technique for eukaryotic gene knockdown. siRNA GC-content negatively correlates with RNAi efficiency, and it is of interest to have a convincing mechanistic interpretation of this observation. We here examine this issue by considering the secondary structures for both the target messenger RNA (mRNA) and the siRNA guide strand. RESULTS: By analyzing a unique homogeneous data set of 101 shRNAs targeted to 100 endogenous human genes, we find that: 1) target site accessibility is more important than GC-content for efficient RNAi; 2) there is an appreciable negative correlation between GC-content and RNAi activity; 3) for the predicted structure of the siRNA guide strand, there is a lack of correlation between RNAi activity and either the stability or the number of free dangling nucleotides at an end of the structure; 4) there is a high correlation between target site accessibility and GC-content. For a set of representative structural RNAs, the GC content of 62.6% for paired bases is significantly higher than the GC content of 38.7% for unpaired bases. Thus, for a structured RNA, a region with higher GC content is likely to have more stable secondary structure. Furthermore, by partial correlation analysis, the correlation for GC-content is almost completely diminished, when the effect of target accessibility is controlled. CONCLUSION: These findings provide a target-structure-based interpretation and mechanistic insight for the effect of GC-content on RNAi efficiency

    mirWIP: microRNA target prediction based on microRNA-containing ribonucleoprotein-enriched transcripts

    Get PDF
    Target prediction for animal microRNAs (miRNAs) has been hindered by the small number of verified targets available to evaluate the accuracy of predicted miRNA-target interactions. Recently, a dataset of 3,404 miRNA-associated mRNA transcripts was identified by immunoprecipitation of the RNA-induced silencing complex components AIN-1 and AIN-2. Our analysis of this AIN-IP dataset revealed enrichment for defining characteristics of functional miRNA-target interactions, including structural accessibility of target sequences, total free energy of miRNA-target hybridization and topology of base-pairing to the 5' seed region of the miRNA. We used these enriched characteristics as the basis for a quantitative miRNA target prediction method, miRNA targets by weighting immunoprecipitation-enriched parameters (mirWIP), which optimizes sensitivity to verified miRNA-target interactions and specificity to the AIN-IP dataset. MirWIP can be used to capture all known conserved miRNA-mRNA target relationships in Caenorhabditis elegans at a lower false-positive rate than can the current standard methods

    Factors Influencing the Identification of Transcription Factor Binding Sites by Cross-Species Comparison

    No full text
    As the number of sequenced genomes has grown, the questions of which species are most useful and how many genomes are sufficient for comparison have become increasingly important for comparative genomics studies. We have systematically addressed these questions with respect to phylogenetic footprinting of transcription factor (TF) binding sites in the γ-proteobacteria, and have evaluated the statistical significance of our motif predictions. We used a study set of 166 Escherichia coli genes that have experimentally identified TF binding sites upstream of the gene, with orthologous data from nine additional γ-proteobacteria for phylogenetic footprinting. Just three species were sufficient for ∼74.0% of the motif predictions to correspond to the experimentally reported E. coli sites, and important characteristics to consider when choosing species were phylogenetic distance, genome size, and natural habitat. We also performed simulations using randomized data to determine the critical maximum a posteriori probability (MAP) values for statistical significance of our motif predictions (P = 0.05). Approximately 60% of motif predictions containing sites from just three species had average MAP values above these critical MAP values. The inclusion of a species very closely related to E. coli increased the number of statistically significant motif predictions, despite substantially increasing the critical MAP value. [Supplemental material is available online at http://www.genome.org. In addition, our motif predictions for the study set and the entire E. coli genome are available online at http://www.wadsworth.org/resnres/bioinfo/.
    corecore