6 research outputs found

    Cis-regulatory module detection using constraint programming

    Get PDF
    We propose a method for finding CRMs in a set of co-regulated genes. Each CRM consists of a set of binding sites of transcription factors. We wish to find CRMs involving the same transcription factors in multiple sequences. Finding such a combination of transcription factors is inherently a combinatorial problem. We solve this problem by combining the principles of itemset mining and constraint programming. The constraints involve the putative binding sites of transcription factors, the number of sequences in which they co-occur and the proximity of the binding sites. Genomic background sequences are used to assess the significance of the modules. We experimentally validate our approach and compare it with state-of-the-art techniques

    Unveiling combinatorial regulation through the combination of ChIP information and in silico cis-regulatory module detection

    Get PDF
    Computationally retrieving biologically relevant cis-regulatory modules (CRMs) is not straightforward. Because of the large number of candidates and the imperfection of the screening methods, many spurious CRMs are detected that are as high scoring as the biologically true ones. Using ChIP-information allows not only to reduce the regions in which the binding sites of the assayed transcription factor (TF) should be located, but also allows restricting the valid CRMs to those that contain the assayed TF (here referred to as applying CRM detection in a query-based mode). In this study, we show that exploiting ChIP-information in a query-based way makes in silico CRM detection a much more feasible endeavor. To be able to handle the large datasets, the query-based setting and other specificities proper to CRM detection on ChIP-Seq based data, we developed a novel powerful CRM detection method 'CPModule'. By applying it on a well-studied ChIP-Seq data set involved in self-renewal of mouse embryonic stem cells, we demonstrate how our tool can recover combinatorial regulation of five known TFs that are key in the self-renewal of mouse embryonic stem cells. Additionally, we make a number of new predictions on combinatorial regulation of these five key TFs with other TFs documented in TRANSFAC

    Cis-regulatory module detection using constraint programming

    Get PDF
    We propose a method for finding CRMs in a set of co-regulated genes. Each CRM consists of a set of binding sites of transcription factors. We wish to find CRMs involving the same transcription factors in multiple sequences. Finding such a combination of transcription factors is inherently a combinatorial problem. We solve this problem by combining the principles of itemset mining and constraint programming. The constraints involve the putative binding sites of transcription factors, the number of sequences in which they co-occur and the proximity of the binding sites. Genomic background sequences are used to assess the significance of the modules. We experimentally validate our approach and compare it with state-of-the-art techniques.acceptance rate = 17.2%status: publishe

    Proximity-based cis-regulatory module detection using constraint programming for itemset mining

    No full text
    cis-regulatory modules (CRMs) are combinations of Transcription Factor Binding Sites involved in the regulation of genes. The detection of CRMs is key in developing a better understanding of gene regulation. Identifying significant combinations of binding sites is a difficult computational problem. Existing techniques use heuristic methods or strong restrictions to make the problem tractable, and are not very extendible. We present an extendible technique for enumerating all potential CRMs using only a biologically well-motivated restriction on the proximity of the binding sites involved. Our method consists of 3 phases: First, the genomic sequences under investigation are screened using an existing library of motif models, namely, position weight matrices. This screening identifies Transcription Factor Binding Site (TFBS) hits. Secondly, we use constraint programming for itemset mining to efficiently enumerate all combinations of TFBS hits that co-occur within a pre-defined distance, while avoiding undesirable redundancies. This results in an exhaustive list of potential CRMs. Lastly, the CRMs are ranked using statistical methodsstatus: publishe
    corecore