13 research outputs found

    Extended Sunflower Hidden Markov Models for the recognition of homotypic cis-regulatory modules}

    Get PDF
    The transcription of genes is often regulated not only by transcription factors binding at single sites per promoter, but by the interplay of multiple copies of one or more transcription factors binding at multiple sites forming a cis-regulatory module. The computational recognition of cis-regulatory modules from ChIP-seq or other high-throughput data is crucial in modern life and medical sciences. A common type of cis-regulatory modules are homotypic clusters of binding sites, i.e., clusters of binding sites of one transcription factor. For their recognition the homotypic Sunflower Hidden Markov Model is a promising statistical model. However, this model neglects statistical dependences among nucleotides within binding sites and flanking regions, which makes it not well suited for de-novo motif discovery. Here, we propose an extension of this model that allows statistical dependences within binding sites, their reverse complements, and flanking regions. We study the efficacy of this extended homotypic Sunflower Hidden Markov Model based on ChIP-seq data from the Human ENCODE Project and find that it often outperforms the traditional homotypic Sunflower Hidden Markov Model

    Unveiling combinatorial regulation through the combination of ChIP information and in silico cis-regulatory module detection

    Get PDF
    Computationally retrieving biologically relevant cis-regulatory modules (CRMs) is not straightforward. Because of the large number of candidates and the imperfection of the screening methods, many spurious CRMs are detected that are as high scoring as the biologically true ones. Using ChIP-information allows not only to reduce the regions in which the binding sites of the assayed transcription factor (TF) should be located, but also allows restricting the valid CRMs to those that contain the assayed TF (here referred to as applying CRM detection in a query-based mode). In this study, we show that exploiting ChIP-information in a query-based way makes in silico CRM detection a much more feasible endeavor. To be able to handle the large datasets, the query-based setting and other specificities proper to CRM detection on ChIP-Seq based data, we developed a novel powerful CRM detection method 'CPModule'. By applying it on a well-studied ChIP-Seq data set involved in self-renewal of mouse embryonic stem cells, we demonstrate how our tool can recover combinatorial regulation of five known TFs that are key in the self-renewal of mouse embryonic stem cells. Additionally, we make a number of new predictions on combinatorial regulation of these five key TFs with other TFs documented in TRANSFAC

    Erroneous attribution of relevant transcription factor binding sites despite successful prediction of cis-regulatory modules

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Cis</it>-regulatory modules are bound by transcription factors to regulate gene expression. Characterizing these DNA sequences is central to understanding gene regulatory networks and gaining insight into mechanisms of transcriptional regulation, but genome-scale regulatory module discovery remains a challenge. One popular approach is to scan the genome for clusters of transcription factor binding sites, especially those conserved in related species. When such approaches are successful, it is typically assumed that the activity of the modules is mediated by the identified binding sites and their cognate transcription factors. However, the validity of this assumption is often not assessed.</p> <p>Results</p> <p>We successfully predicted five new <it>cis</it>-regulatory modules by combining binding site identification with sequence conservation and compared these to unsuccessful predictions from a related approach not utilizing sequence conservation. Despite greatly improved predictive success, the positive set had similar degrees of sequence and binding site conservation as the negative set. We explored the reasons for this by mutagenizing putative binding sites in three <it>cis</it>-regulatory modules. A large proportion of the tested sites had little or no demonstrable role in mediating regulatory element activity. Examination of loss-of-function mutants also showed that some transcription factors supposedly binding to the modules are not required for their function.</p> <p>Conclusions</p> <p>Our results raise important questions about interpreting regulatory module predictions obtained by finding clusters of conserved binding sites. Attribution of function to these sites and their cognate transcription factors may be incorrect even when modules are successfully identified. Our study underscores the importance of empirical validation of computational results even when these results are in line with expectation.</p

    ReLA, a local alignment search tool for the identification of distal and proximal gene regulatory regions and their conserved transcription factor binding sites

    Get PDF
    Motivation: The prediction and annotation of the genomic regions involved in gene expression has been largely explored. Most of the energy has been devoted to the development of approaches that detect transcription start sites, leaving the identification of regulatory regions and their functional transcription factor binding sites (TFBSs) largely unexplored and with important quantitative and qualitative methodological gaps

    Emerging applications of read profiles towards the functional annotation of the genome

    Get PDF
    Functional annotation of the genome in various species is important to understand their phenotypic complexity. The road towards functional annotation involves several challenges ranging from experiments on individual molecules to large-scale analysis of high-throughput sequencing (HTS) data. HTS data is typically a result of the protocol designed to address specific research questions. The sequencing results in reads, which when mapped to a reference genome often leads to the formation of distinct patterns (read profiles). Interpretation of these read profiles are essential for the analysis in relation to the research question addressed. Several strategies have been employed at varying levels of abstraction ranging from a somewhat ad hoc to a more systematic analysis of read profiles. These include methods which can compare read profiles, e.g. from direct (non-sequence based) alignments to classification of patterns into functional groups. In this review, we highlight the emerging applications of read profiles for the annotation of non-coding RNA and cis-regulatory regions such as enhancers and promoters. We also discuss the biological rationale behind their formation

    Conserved elements associated with ribosomal genes and their trans-splice acceptor sites in Caenorhabditis elegans

    Get PDF
    The recent publication of the Caenorhabditis elegans cisRED database has provided an extensive catalog of upstream elements that are conserved between nematode genomes. We have performed a secondary analysis to determine which subsequences of the cisRED motifs are found in multiple locations throughout the C. elegans genome. We used the word-counting motif discovery algorithm DME to form the motifs into groups based on sequence similarity. We then examined the genes associated with each motif group using DAVID and Ontologizer to determine which groups are associated with genes that also have significant functional associations in the Gene Ontology and other gene annotation sources. Of the 3265 motif groups formed, 612 (19%) had significant functional associations with respect to GO terms. Eight of the first 20 motif groups based on frequent dodecamers among the cisRED motif sequences were specifically associated with ribosomal protein genes; two of these were similar to mouse EBP-45, rat HNF3-family and Drosophila Zeste transcription factor binding sites. Additionally, seven motif groups were extensions of the canonical C. elegans trans-splice acceptor site. One motif group was tested for regulatory function in a series of green fluorescent protein expression experiments and was shown to be involved in pharyngeal expression

    Identification of co-regulated candidate genes by promoter analysis.

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Computational prediction of functional similarity of CRMs

    Get PDF
    EThOS - Electronic Theses Online ServiceWarwick Systems Biology CentreHuman frontier science programGBUnited Kingdo
    corecore