Search CORE

19 research outputs found

The properties and sizes of the datasets used.

Author: David Simcha (299744)
Donald Geman (78677)
Nathan D. Price (190240)
Publication venue
Publication date
Field of study

N Seqs is the number of clusters that contained at least ten sequences.Clusters with fewer than ten sequences were excluded from the analysis due to excessively small sample size.</p

FigShare

The mean AUROC of all algorithms on all datasets using independent holdout data.

Author: David Simcha (299744)
Donald Geman (78677)
Nathan D. Price (190240)
Publication venue
Publication date
Field of study

This validation is unbiased.</p

FigShare

Large-scale integration of cancer microarray data identifies a robust common cancer signature-0

Author: Donald Geman (78677)
Lei Xu (78676)
Raimond L Winslow (78678)
Publication venue
Publication date
Field of study

Copyright information:Taken from "Large-scale integration of cancer microarray data identifies a robust common cancer signature"http://www.biomedcentral.com/1471-2105/8/275BMC Bioinformatics 2007;8():275-275.Published online 30 Jul 2007PMCID:PMC1950528.n) is used to illustrate the gene expression values of the signature genes in the figure. The heatmap is generated by the matrix2png software [24]. For each data set, the expression value for each gene is normalized across the samples to zero mean and one standard deviation (SD) for visualization purposes. Genes with expression levels greater than the mean are colored in red and those below the mean are colored in green. The scale indicates the number of SDs above or below the mean

FigShare

The fraction of variance in dimer frequency across sequences explained by expression profile or transcription factor binding sequence set and associated F statistic P-value.

Author: David Simcha (299744)
Donald Geman (78677)
Nathan D. Price (190240)
Publication venue
Publication date
Field of study

For the Human Cmap data, this was assessed both for the 2,000 nucleotides upstream of the coding start site and for the intron sequences.</p

FigShare

The mean AUROC of all algorithms on all datasets based on training and testing on the same data.

Author: David Simcha (299744)
Donald Geman (78677)
Nathan D. Price (190240)
Publication venue
Publication date
Field of study

The optimistic bias reveals massive overfitting.</p

FigShare

Generative models are too null.

Author: David Simcha (299744)
Donald Geman (78677)
Nathan D. Price (190240)
Publication venue
Publication date
Field of study

Panel (a): Quantile plot of Meme E-values for approximately 15,000 random runs, with E-values excluded. The X-axis represents the E-value as reported by MEME. The Y-axis represents the quantile. For example, under our null model E-values below are reported with probability slightly more than . Panels (b) and (c): Quantile plots of LR false discovery rates, similar to the Meme E-value quantile plots, for the Beer et al. and Human Cmap datasets respectively. Panel (d): Z-score plots of A/T fraction of yeast and human intergenic sequences relative to the distribution expected under a 6th order Markov model, with the standard normal distribution (red) shown for reference.</p

FigShare

The mean holdout AUROC of the LR and ALR algorithms for motifs for non-significant (FDR0.05) and significant (FDR0.05) motifs respectively.

Author: David Simcha (299744)
Donald Geman (78677)
Nathan D. Price (190240)
Publication venue
Publication date
Field of study

The mean holdout AUROC of the LR and ALR algorithms for motifs for non-significant (FDR0.05) and significant (FDR0.05) motifs respectively.</p

FigShare

Merging microarray data from separate breast cancer studies provides a robust prognostic test-1

Author: Aik Choon Tan (82343)
Donald Geman (78677)
Lei Xu (78676)
Raimond L Winslow (78678)
Publication venue
Publication date
Field of study

Eat map is generated using the matrix2png software [34]. There are 80 rows corresponding to the 80 gene pairs; the displayed intensities are the differences between the expression values of the two genes in each pair. The expression value for each difference is normalized across the samples to zero mean and one standard deviation (SD) for visualization purposes. Differences with expression levels greater than the mean are colored in red and those below the mean are colored in green. The scale indicates the number of SDs above or below the mean.Copyright information:Taken from "Merging microarray data from separate breast cancer studies provides a robust prognostic test"http://www.biomedcentral.com/1471-2105/9/125BMC Bioinformatics 2008;9():125-125.Published online 27 Feb 2008PMCID:PMC2409450.</p

FigShare

Merging microarray data from separate breast cancer studies provides a robust prognostic test-2

Author: Aik Choon Tan (82343)
Donald Geman (78677)
Lei Xu (78676)
Raimond L Winslow (78678)
Publication venue
Publication date
Field of study

Itan patients between the good-outcome group and the poor-outcome group. The LRT is based on the integrated data in (A) and the single, Wang data set in (B). CI denotes confidence interval and the -value is calculated by the log-rank test.Copyright information:Taken from "Merging microarray data from separate breast cancer studies provides a robust prognostic test"http://www.biomedcentral.com/1471-2105/9/125BMC Bioinformatics 2008;9():125-125.Published online 27 Feb 2008PMCID:PMC2409450.</p

FigShare

Inter-study validation and randomized cross-validation performance.

Author: Andrew T. Magis (437560)
Donald Geman (78677)
Jaeyun Sung (437557)
Nathan D. Price (190240)
Shuyi Ma (437558)
Yuliang Wang (437561)
Publication venue
Publication date
Field of study

The graphs show ISV and RCV results from SVM (A) and ISSAC (B). For clarity, the Study ID labels have been excluded from this visualization (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0110840#pone.0110840.s002" target="_blank">Text S1</a> for expanded versions of these plots that include the individual Study ID labels). The colored bars report sensitivities achieved on the validation study designated in the horizontal axis (e.g., the bar on the farthest left in (A) shows that 74% of ADC samples in the first ADC study are correctly classified by SVM when that study is excluded from training). The order of studies in the horizontal axis is identical for panels (A) and (B). Dashed lines represent average ISV sensitivities for each phenotype. Solid lines report corresponding ten-fold RCV sensitivities of each phenotype.</p

FigShare