9 research outputs found

    Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space

    No full text
    <div><p>The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than <i>k</i>-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in <i>C</i>. <i>elegans</i> using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.</p></div

    modENCODE ChIP-seq results.

    No full text
    <p>Binding specificities for four <i>C</i>. <i>elegans</i> transcription factors as learnt from ChIP-seq data from the modENCODE project.</p

    MotifSpec detects more known yeast motifs than the combination of k-means clustering and AlignACE (km-aa).

    No full text
    <p>There were 97 known motifs in total. A CompareACE motif similarity score of 0.75 or greater was considered a match. ChIP target sets were considered a match if the hypergeometric p-value for overlap was less than 10<sup>−7</sup>.</p

    The top 5 motifs found by MotifSpec in a genome-wide search of a <i>C</i>. <i>elegans</i> sequence and expression dataset.

    No full text
    <p>Alongside each motif is its specificity score and any Gene Ontology (GO) and Anatomy Ontology (AO) terms that were enriched in the list of target genes.</p

    Human ChIP-seq results.

    No full text
    <p>MotifSpec performs comparably to HOMER and Dimont and consistently better than DECOD, DREME, and Amadeus in finding a discriminative motif when run on ChIP-seq data for three human transcription factors, CTCF, NRSF and the estrogen receptor (ER). Panels a, b, and c show the ROC curves and auROC values for the top scoring motif from each program when run on the three datasets. Panel d shows a summary comparison of auROC for each algorithm and motif, and panel e shows the top scoring motif found by each program.</p

    Motifs found by MotifSpec perform better at retrieval of bound probes than the motifs found by Seed-and-Wobble.

    No full text
    <p>The barchart shows the percentage improvement in the area under the receiver-operator characteristic (ROC) curve, and the top motif found by MotifSpec performs better than the Seed-and-Wobble motif in the majority of cases where either motif has an AUC of 0.75 or better. Three representative ROC curves are shown, two (Gln3 and Pbf2-9) in which MotifSpec outperforms Seed-and-Wobble and one in which Seed-and Wobble is better (Sum1-9). The red curve is the ROC for the Seed-and-Wobble motif and the green curve is the ROC for the best MotifSpec motif.</p

    Mouse ChIP-seq results MotifSpec outperforms DREME when run on ChIP-seq data for 13 transcription factors from mouse embryonic stem cells.

    No full text
    <p>The left panel shows a plot of the AUC for the top motif reported by MotifSpec against the AUC for the top motif reported by DREME, while the right panel shows the improvement in AUC for the MotifSpec motif relative to the DREME motif.</p

    MotifSpec optimizes for specificity rather than over-representation and uses a dynamic search space.

    No full text
    <p>(A) An over-represented motif is found in the search space more often than expected according to some background model. It is not necessarily predictive. A specific motif is found in a much higher frequency in the search space than in the background sequences. A dynamic search space threshold finds the optimal search space such that the motif is most discriminative. (B) A schematic of the MotifSpec algorithm. The PWM model is initialized with a random sequence and position in the search space. The model is iteratively refined and the motif and binding score thresholds are adjusted at convergence to maximize specificity. (C) An example of sequences scored using the model. Each sequence has a motif score and a binding score. The binding score determines if a sequence is in the search space. The motif score determines if the sequence has an instance of the motif. The sequences are color-coded according to the set to which they belong as defined in (B).</p

    MotifSpec performs better at recovery of seeded motifs from a synthetic sequence-expression dataset than two-step procedures of k-means clustering and motif-finding using AlignACE, MEME and Weeder.

    No full text
    <p>MotifSpec performs better at recovery of seeded motifs from a synthetic sequence-expression dataset than two-step procedures of k-means clustering and motif-finding using AlignACE, MEME and Weeder.</p
    corecore