11,190 research outputs found

    Identification of candidate regulatory sequences in mammalian 3' UTRs by statistical analysis of oligonucleotide distributions

    Get PDF
    3' untranslated regions (3' UTRs) contain binding sites for many regulatory elements, and in particular for microRNAs (miRNAs). The importance of miRNA-mediated post-transcriptional regulation has become increasingly clear in the last few years. We propose two complementary approaches to the statistical analysis of oligonucleotide frequencies in mammalian 3' UTRs aimed at the identification of candidate binding sites for regulatory elements. The first method is based on the identification of sets of genes characterized by evolutionarily conserved overrepresentation of an oligonucleotide. The second method is based on the identification of oligonucleotides showing statistically significant strand asymmetry in their distribution in 3' UTRs. Both methods are able to identify many previously known binding sites located in 3'UTRs, and in particular seed regions of known miRNAs. Many new candidates are proposed for experimental verification.Comment: Added two reference

    Multigenome DNA sequence conservation identifies Hox cis-regulatory elements

    Get PDF
    To learn how well ungapped sequence comparisons of multiple species can predict cis-regulatory elements in Caenorhabditis elegans, we made such predictions across the large, complex ceh-13/lin-39 locus and tested them transgenically. We also examined how prediction quality varied with different genomes and parameters in our comparisons. Specifically, we sequenced ∼0.5% of the C. brenneri and C. sp. 3 PS1010 genomes, and compared five Caenorhabditis genomes (C. elegans, C. briggsae, C. brenneri, C. remanei, and C. sp. 3 PS1010) to find regulatory elements in 22.8 kb of noncoding sequence from the ceh-13/lin-39 Hox subcluster. We developed the MUSSA program to find ungapped DNA sequences with N-way transitive conservation, applied it to the ceh-13/lin-39 locus, and transgenically assayed 21 regions with both high and low degrees of conservation. This identified 10 functional regulatory elements whose activities matched known ceh-13/lin-39 expression, with 100% specificity and a 77% recovery rate. One element was so well conserved that a similar mouse Hox cluster sequence recapitulated the native nematode expression pattern when tested in worms. Our findings suggest that ungapped sequence comparisons can predict regulatory elements genome-wide

    Systematic discovery of structural elements governing stability of mammalian messenger RNAs.

    Get PDF
    Decoding post-transcriptional regulatory programs in RNA is a critical step towards the larger goal of developing predictive dynamical models of cellular behaviour. Despite recent efforts, the vast landscape of RNA regulatory elements remains largely uncharacterized. A long-standing obstacle is the contribution of local RNA secondary structure to the definition of interaction partners in a variety of regulatory contexts, including--but not limited to--transcript stability, alternative splicing and localization. There are many documented instances where the presence of a structural regulatory element dictates alternative splicing patterns (for example, human cardiac troponin T) or affects other aspects of RNA biology. Thus, a full characterization of post-transcriptional regulatory programs requires capturing information provided by both local secondary structures and the underlying sequence. Here we present a computational framework based on context-free grammars and mutual information that systematically explores the immense space of small structural elements and reveals motifs that are significantly informative of genome-wide measurements of RNA behaviour. By applying this framework to genome-wide human mRNA stability data, we reveal eight highly significant elements with substantial structural information, for the strongest of which we show a major role in global mRNA regulation. Through biochemistry, mass spectrometry and in vivo binding studies, we identified human HNRPA2B1 (heterogeneous nuclear ribonucleoprotein A2/B1, also known as HNRNPA2B1) as the key regulator that binds this element and stabilizes a large number of its target genes. We created a global post-transcriptional regulatory map based on the identity of the discovered linear and structural cis-regulatory elements, their regulatory interactions and their target pathways. This approach could also be used to reveal the structural elements that modulate other aspects of RNA behaviour

    Ancient Pbx-Hox signatures define hundreds of vertebrate developmental enhancers

    Get PDF
    Background: Gene regulation through cis-regulatory elements plays a crucial role in development and disease. A major aim of the post-genomic era is to be able to read the function of cis-regulatory elements through scrutiny of their DNA sequence. Whilst comparative genomics approaches have identified thousands of putative regulatory elements, our knowledge of their mechanism of action is poor and very little progress has been made in systematically de-coding them. Results: Here, we identify ancient functional signatures within vertebrate conserved non-coding elements (CNEs) through a combination of phylogenetic footprinting and functional assay, using genomic sequence from the sea lamprey as a reference. We uncover a striking enrichment within vertebrate CNEs for conserved binding-site motifs of the Pbx-Hox hetero-dimer. We further show that these predict reporter gene expression in a segment specific manner in the hindbrain and pharyngeal arches during zebrafish development. Conclusions: These findings evoke an evolutionary scenario in which many CNEs evolved early in the vertebrate lineage to co-ordinate Hox-dependent gene-regulatory interactions that pattern the vertebrate head. In a broader context, our evolutionary analyses reveal that CNEs are composed of tightly linked transcription-factor binding-sites (TFBSs), which can be systematically identified through phylogenetic footprinting approaches. By placing a large number of ancient vertebrate CNEs into a developmental context, our findings promise to have a significant impact on efforts toward de-coding gene-regulatory elements that underlie vertebrate development, and will facilitate building general models of regulatory element evolution

    Genome-Wide Computational Prediction and Analysis of Core Promoter Elements across Plant Monocots and Dicots

    Get PDF
    Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the computational prediction of CPEs across eight plant genomes to help better understand the transcription initiation complex assembly. The distribution of thirteen known CPEs across four monocots (Brachypodium distachyon, Oryza sativa ssp. japonica, Sorghum bicolor, Zea mays) and four dicots (Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Glycine max) reveals the structural organization of the core promoter in relation to the TATA-box as well as with respect to other CPEs. The distribution of known CPE motifs with respect to transcription start site (TSS) exhibited positional conservation within monocots and dicots with slight differences across all eight genomes. Further, a more refined subset of annotated genes based on orthologs of the model monocot (O. sativa ssp. japonica) and dicot (A. thaliana) genomes supported the positional distribution of these thirteen known CPEs. DNA free energy profiles provided evidence that the structural properties of promoter regions are distinctly different from that of the non-regulatory genome sequence. It also showed that monocot core promoters have lower DNA free energy than dicot core promoters. The comparison of monocot and dicot promoter sequences highlights both the similarities and differences in the core promoter architecture irrespective of the species-specific nucleotide bias. This study will be useful for future work related to genome annotation projects and can inspire research efforts aimed to better understand regulatory mechanisms of transcription

    Isolation of three novel rat and mouse papillomaviruses and their genomic characterization.

    Get PDF
    Despite a growing knowledge about the biological diversity of papillomaviruses (PV), only little is known about non-human PV in general and about PV mice models in particular. We cloned and sequenced the complete genomes of two novel PV types from the Norway rat (Rattus norvegicus; RnPV2) and the wood mouse (Apodemus sylvaticus; AsPV1) as well as a novel variant of the recently described MmuPV1 (originally designated as MusPV) from a house mouse (Mus musculus; MmuPV1 variant). In addition, we conducted phylogenetic analyses using a systematically representative set of 79 PV types, including the novel sequences. As inferred from concatenated amino acid sequences of six proteins, MmuPV1 variant and AsPV1 nested within the Beta+Xi-PV super taxon as members of the Pi-PV. RnPV2 is a member of the Iota-PV that has a distant phylogenetic position from Pi-PV. The phylogenetic results support a complex scenario of PV diversification driven by different evolutionary forces including co-divergence with hosts and adaptive radiations to new environments. PV types particularly isolated from mice and rats are the basis for new animal models, which are valuable to study PV induced tumors and new treatment options

    Features of mammalian microRNA promoters emerge from polymerase II chromatin immunoprecipitation data

    Get PDF
    Background: MicroRNAs (miRNAs) are short, non-coding RNA regulators of protein coding genes. miRNAs play a very important role in diverse biological processes and various diseases. Many algorithms are able to predict miRNA genes and their targets, but their transcription regulation is still under investigation. It is generally believed that intragenic miRNAs (located in introns or exons of protein coding genes) are co-transcribed with their host genes and most intergenic miRNAs transcribed from their own RNA polymerase II (Pol II) promoter. However, the length of the primary transcripts and promoter organization is currently unknown. Methodology: We performed Pol II chromatin immunoprecipitation (ChIP)-chip using a custom array surrounding regions of known miRNA genes. To identify the true core transcription start sites of the miRNA genes we developed a new tool (CPPP). We showed that miRNA genes can be transcribed from promoters located several kilobases away and that their promoters share the same general features as those of protein coding genes. Finally, we found evidence that as many as 26% of the intragenic miRNAs may be transcribed from their own unique promoters. Conclusion: miRNA promoters have similar features to those of protein coding genes, but miRNA transcript organization is more complex. © 2009 Corcoran et al

    Identification of cis-regulatory sequence variations in individual genome sequences

    Get PDF
    Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. For instance, disrupting variations in a HNF4A transcription factor binding site upstream of the Factor IX gene contributes causally to hemophilia B Leyden. Although clinical genome sequence analysis currently focuses on the identification of protein-altering variation, the impact of cis-regulatory mutations can be similarly strong. New technologies are now enabling genome sequencing beyond exomes, revealing variation across the non-coding 98% of the genome responsible for developmental and physiological patterns of gene activity. The capacity to identify causal regulatory mutations is improving, but predicting functional changes in regulatory DNA sequences remains a great challenge. Here we explore the existing methods and software for prediction of functional variation situated in the cis-regulatory sequences governing gene transcription and RNA processing

    Predicting Combinatorial Binding of Transcription Factors to Regulatory Elements in the Human Genome by Association Rule Mining

    Get PDF
    Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cisregulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment. Results: Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature. Conclusion: Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.NIAAA Alcohol Training GrantNational Science FoundationCellular and Molecular Biolog
    corecore