11 research outputs found

    Consensus ensemble -clustering tree reveals the recursive splitting of breast cancer subtypes

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Portraits of breast cancer progression"</p><p>http://www.biomedcentral.com/1471-2105/8/291</p><p>BMC Bioinformatics 2007;8():291-291.</p><p>Published online 6 Aug 2007</p><p>PMCID:PMC1978212.</p><p></p> At = 2, the ensemble clustering split the normal samples from the disease samples. At = 3, the normal cluster remained unchanged and the disease samples split into low grade (pathological grades 1 and 2) and high grade (pathological grades 2 and 3). The optimum number of clusters in the data was seven corresponding to one normal cluster, two low grade clusters and four high grade clusters. Between two values, the samples did not switch clusters, indicating that the hierarchical structure in the figure is a strong property of the data. In the final disease clusters, samples from the same patient microdissected from DCIS and IDC lesions were found in the same cluster, indicating that the disease subtypes are more heterogeneous than disease progression within a subtype

    Heatmap of expression levels of the top markers for progression from DCIS to IDC in the low grade and high grade tumor subgroups

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Portraits of breast cancer progression"</p><p>http://www.biomedcentral.com/1471-2105/8/291</p><p>BMC Bioinformatics 2007;8():291-291.</p><p>Published online 6 Aug 2007</p><p>PMCID:PMC1978212.</p><p></p> In each subtype, we use the upregulated genes which have good FDR under WV to stratify the samples. We show the 10 top genes for DCIS to IDC progression in LG and HG tumors. Since the sample sizes were small, the p values were computed using permutation tests and the FDR values were computed from these p values. The FDR values under WV for these genes are 0.6 for LG and 0.2 for HG

    The agreement matrix for Nsamples is an N× Nmatrix whose entries are the fraction of cases across replicates for which two samples fall into the same cluster

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Portraits of breast cancer progression"</p><p>http://www.biomedcentral.com/1471-2105/8/291</p><p>BMC Bioinformatics 2007;8():291-291.</p><p>Published online 6 Aug 2007</p><p>PMCID:PMC1978212.</p><p></p> Red/green represent high/low fractional values across clustering methods and data perturbation replicates. The normals and the LG1 and LG2 are clearly well separated while the HG1, HG2, HG3 and HG4 separation is weaker. We find that the optimum number of clusters using gap-statistics oscillates between 6 and 7 with the HG3 and HG4 clusters merging at -6

    Heatmap of expression levels of the top 10 upregulated genes for progression from DCIS to IDC for each subtype

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "Portraits of breast cancer progression"</p><p>http://www.biomedcentral.com/1471-2105/8/291</p><p>BMC Bioinformatics 2007;8():291-291.</p><p>Published online 6 Aug 2007</p><p>PMCID:PMC1978212.</p><p></p> Each subgroup is in a framed box to identify its samples and distinguish gene markers. Since the sample sizes are small, the p values were computed using permutation tests and the FDR rates inferred from these p values. The FDR rates under WV for these genes are: 0.02 for LG1, 0.2 for LG2, 0.2 for HG1, 0.5 for HG2, 0.06 for HG3 and 0.002 for HG4

    The most significant genomic regions under selection in MKK using XP-EHH, with LWK as the reference population.

    No full text
    <p>SNPs with positive genome-wide significant XP-EHH scores (XP-EHH ≥4.796, two-tailed Bonferroni corrected p≤0.05) were grouped into contiguous genomic clusters using genotype R<sup>2</sup> using the same criterion as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0044751#pone-0044751-t001" target="_blank">Table 1</a>. Overlapping clusters were merged. Column E lists the number of significant SNPs in each each cluster. Complete lists of genome-wide significant SNPs and clusters identified by XP-EHH are in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0044751#pone.0044751.s003" target="_blank">Tables S3a and S3b</a>.</p

    The most significant non-synonymous SNPs under selection in MKK using Fst, with LWK as the reference population.

    No full text
    <p>The most significant non-synonymous SNPs identified as candidates for selection by Fst. The complete list of 1,232 SNPs identified as selection candidates by Fst (p<sub>B</sub> <8.6E−6 and p<sub>E</sub> <0.001) is in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0044751#pone.0044751.s001" target="_blank">Table S1</a>.</p

    Top 20 genomic regions identified as selection candidates in MKK using the Fst statistic and clustering.

    No full text
    <p>1,232 SNPs with significant Fst scores (p<sub>B</sub><8.6E−6, p<sub>E</sub><0.001) were clustered into contiguous genomic regions of linkage disequilibrium. A cluster was defined as a collection of SNPs in a genomic region where each SNP had genotype R<sup>2</sup>≥0.25 with at least one other SNP in the cluster. Clusters containing a SNP with maximum XP-EHH score >3 were identified as being MKK associated. The 22 top clusters are ranked by the highest Fst value for a SNP pair in a cluster. The complete set of clusters identified by Fst is in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0044751#pone.0044751.s002" target="_blank">Table S2</a>.</p

    Population structure components for individuals from CEU, ASW, LWK, MKK and YRI.

    No full text
    <p>Results from STRUCTURE version 2.3 on genotype data for 12,999 randomly selected SNPs in 578 founder (unrelated) individuals from the CEU, ASW, LWK, MKK and YRI HapMap populations. The no-admixture model showed that the data was best fit by 6 inferred ancestral populations. Each column represents an individual, and the colors indicate the fractions of their genotype attributable to ancestry from each of the 6 ancestral populations.</p

    The most significant genomic regions under selection in MKK using iHS.

    No full text
    <p>Using a sliding window of 50 SNPs wide, genomic regions were scored for the fraction of SNPs with |iHS|>2. The top 0.02% of non-overlapping windows were identified and merged into genomic clusters based on genotype R<sup>2</sup> using the same criterion as in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0044751#pone-0044751-t001" target="_blank">Table 1</a>. Clusters are ranked by the maximum |iHS| value in the cluster. Complete lists of genome-wide significant SNPs and regions identified by iHS are in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0044751#pone.0044751.s002" target="_blank">Tables S2a and S2b</a> respectively.</p

    (a) Genome-wide significant scores identifying candidate regions under selection on Chromosome 2.

    No full text
    <p>Chromosome wide plot of SNPs with significant scores using Fst (empirical p-value <0.001 and Bonferroni corrected permutation test pB <8.6E−6), iHS (normalized |iHS|>2), and XP-EHH (XP-EHH ≥4.796, two-tailed Bonferroni corrected p≤0.05). The SNPs thus identified were clustered on the basis of linkage disequilibrium to identify contiguous genomic regions that are candidates for selections (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0044751#pone-0044751-t001" target="_blank">Table 1</a>,<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0044751#pone-0044751-t002" target="_blank">2</a>,<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0044751#pone-0044751-t003" target="_blank">3</a>,<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0044751#pone-0044751-t004" target="_blank">4</a>). The locus containing the genes LCT and MCM6 (135–137 Mb) was identified by all three metrics as the top candidate for selection. The non-synonymous TC polymorphism at rs2241883 in the FABP1 gene had most significant genome-wide Fst (Fst = 0.25, pE = 3.13E−5). The MKK samples have a high frequency (∼0.45) of the protective C allele, known to be associated with low cholesterol levels in Europeans (plots for other chromosomes in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0044751#pone.0044751.s013" target="_blank">Appendix S6</a>). <b>(b) Inset of the LCT locus on Chromosome 2.</b>An inset of the Fst, iHS and XP-EHH scores for SNPs in the ∼ 1 Mb locus (from 135.8–136.8 Mb) on Chr 2 containing the genes LCT and MCM6. The uniformly high values for all three metrics in this region suggest that this locus has undergone strong selection pressure. The blue marker indicates the position of the lactase associated SNP in MCM6 that we sequenced, which was polymorphic in MKK with frequency pC = 0.58+/−0.14 (68% CI) for the protective C allele.</p
    corecore