6 research outputs found
The fraction of variance in dimer frequency across sequences explained by expression profile or transcription factor binding sequence set and associated F statistic P-value.
<p>For the Human Cmap data, this was assessed both for the 2,000 nucleotides upstream of the coding start site and for the intron sequences.</p
The properties and sizes of the datasets used.
<p>N Seqs is the number of clusters that contained at least ten sequences.</p><p>Clusters with fewer than ten sequences were excluded from the analysis due to excessively small sample size.</p
The mean AUROC of all algorithms on all datasets using independent holdout data.
<p>This validation is unbiased.</p
The mean holdout AUROC of the LR and ALR algorithms for motifs for non-significant (FDR0.05) and significant (FDR0.05) motifs respectively.
<p>The mean holdout AUROC of the LR and ALR algorithms for motifs for non-significant (FDR0.05) and significant (FDR0.05) motifs respectively.</p
The mean AUROC of all algorithms on all datasets based on training and testing on the same data.
<p>The optimistic bias reveals massive overfitting.</p
Generative models are too null.
<p>Panel (a): Quantile plot of Meme E-values for approximately 15,000 random runs, with E-values excluded. The X-axis represents the E-value as reported by MEME. The Y-axis represents the quantile. For example, under our null model E-values below are reported with probability slightly more than . Panels (b) and (c): Quantile plots of LR false discovery rates, similar to the Meme E-value quantile plots, for the Beer et al. and Human Cmap datasets respectively. Panel (d): Z-score plots of A/T fraction of yeast and human intergenic sequences relative to the distribution expected under a 6th order Markov model, with the standard normal distribution (red) shown for reference.</p