5 research outputs found

    DMFS pipeline recovery of previously identified motifs.

    No full text
    <p>Here we list motifs identified by Tillo and Hughes <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0027382#pone.0027382-Tillo1" target="_blank">[48]</a> and Lee <i>et al.</i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0027382#pone.0027382-Lee1" target="_blank">[49]</a> and the number of times these motifs were identified by the DMFS pipeline. Structure related features are omitted, as are transcription binding start sites and features with zero weights. We ran the DMFS pipeline 40 times, with random data partitioning, and counted the number of times each previously identified motif occurred. According to Tillo and Hughes the most discriminative motif is the 4-mer AAAA/TTTT, which emerged in almost every run.</p

    Protein solubility data.

    No full text
    <p>Protein solubility data accuracies for default and tuned parameters settings, as well as for reported and eumerative methods. Reported values are from Magnan <i>et al.</i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0027382#pone.0027382-Magnan1" target="_blank">[5]</a>. The DMFS pipeline results are stable with small standard deviations as determined by 20 runs with random data partitioning: (a) default parameter settings: 0.006 (SVM) and 0.0048 (RF), and (b) tuned parameter settings: 0.0052 (SVM) and 0.0049 (RF).</p

    Illustrative diagram of data flow through the pipeline.

    No full text
    <p>Data is initially partitioned into discovery and classification sets. The classification set is further partitioned into training and validation sets. After WordSpy elicits motifs using the discovery set, fuzznuc or fuzzpro counts corresponding motif occurrences in the remaining data. The training data counts are used to train a classifier, while the validation data counts are used to determine performance (e.g. AUC) of the learned classifier.</p

    ROC curves from DMFS and enumerative methods for the nucleosome occupancy datasets.

    No full text
    <p>The red and green curves are from Gupta <i>et al.. </i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0027382#pone.0027382-Gupta1" target="_blank">[4]</a> for the Dennis and Ozsolak data respectively. The black and blue curves are from the DMFS method for the Dennis and Ozsolak data respectively. For both datasets, the DMFS ROC curve is approximately equal to the ROC curve using enumerative feature generation. This figure was created by manipulating <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0027382#pone-0027382-g001" target="_blank">Figure 1</a> of Gupta <i>et al.. </i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0027382#pone.0027382-Gupta1" target="_blank">[4]</a> in GIMP. The DMFS ROC curves are relative stable. As the false positive rate ranges from 10% to 90% the true positive rate standard deviations have range to for the Dennis data and to for the Ozsolak data.</p

    Nucleosome occupancy data.

    No full text
    <p>Mean AUCs for the nucleosome occupancy datasets and approaches as described in the text. Reported values are from Gupta <i>et al.</i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0027382#pone.0027382-Gupta1" target="_blank">[4]</a>. The DMFS pipeline results are stable with small standard deviations as determined by 40 runs with random data partitioning: Dennis data with (a) default parameter settings: 0.0055 (SVM) and 0.0036 (RF), and (b) tuned parameter settings: 0.0048 (SVM) and 0.0041 (RF); Ozsolak data with (a) default parameter settings: 0.0084 (SVM) and 0.0078 (RF), and (b) tuned parameter settings: 0.011 (SVM) and 0.0086 (RF).</p
    corecore