16 research outputs found

    Meta-Analytic Framework for Sparse <i>K</i>-Means to Identify Disease Subtypes in Multiple Transcriptomic Studies

    No full text
    <p>Disease phenotyping by omics data has become a popular approach that potentially can lead to better personalized treatment. Identifying disease subtypes via unsupervised machine learning is the first step toward this goal. In this article, we extend a sparse <i>K</i>-means method toward a meta-analytic framework to identify novel disease subtypes when expression profiles of multiple cohorts are available. The lasso regularization and meta-analysis identify a unique set of gene features for subtype characterization. An additional pattern matching reward function guarantees consistent subtype signatures across studies. The method was evaluated by simulations and leukemia and breast cancer datasets. The identified disease subtypes from meta-analysis were characterized with improved accuracy and stability compared to single study analysis. The breast cancer model was applied to an independent METABRIC dataset and generated improved survival difference between subtypes. These results provide a basis for diagnosis and development of targeted treatments for disease subgroups. Supplementary materials for this article are available online.</p

    FISH validation of copy number variation detected by Cytoscan HD analysis.

    No full text
    <p>(A) Representative images of FISH analysis using probes specific for BRAF (7q34, red) and centromere of chromosome 7 (green). Left, normal diploid control, right-case 4 lymphoma (BRAF amplified). (B) Representative images of FISH analysis using probes specific for CITED2 (6q23.3, red) and centromere of chromosome 6 (green). Left, normal diploid control, right-case 2 lymphoma (CITED2 deleted).</p

    Genotyping concordance between FFPE and Frozen tissues.

    No full text
    <p>NP-normal ploidy; ND-Not detected; NA-Not applicable; Def-copy number variation definition;</p><p>*-No analysis was reported due to low number of cells survived.</p

    Additional file 1: Table S1. of A computational method for genotype calling in family-based sequencing data

    No full text
    Genotype mismatch rate of heterozygous calls and SNPs with maf <5 % (Simulation I). Table S2. Genotype discordance rate of heterozygous calls (Simulation II). Table S3. Phasing error rate (Simulation I). Table S4. Phasing error rate (Simulation II). Table S5. Mendelian error rate (Simulation I). Table S6. Genotype discordance rate of heterozygous calls (Simulation III). Table S7. Phasing error rate (Simulation III). Table S8. Mendelian error rate (Simulation III). Figure S1. Pedigree of each family in second simulation scheme. Figure S2. Genotype mismatch rate of heterozygous calls (Simulation I). C1: Trios result summarized by ranodmly selected a child to form a trio and all the other children as independent individuals for all 80 families with 100 repeats; C2: nuclear families of two offspring; C3: nuclear families with three offspring and C4: nuclear families of four offspring. (DOCX 1452 kb

    Additional file 1: of Integrative phenotyping framework (iPF): integrative clustering of multiple omics data identifies novel lung disease subphenotypes

    No full text
    Text S1. Materials and data collection. Text S2. Details of smoothing and Feature Topology Plots (FTP). Text S3. Simulation setting to evaluate iPF. Text S4. Comprehensive validation scheme for iPF. Figure S5. (A) An illustration of integrated omics data sets, (B) A workflow to generate future topology plot (FTP). Figure S6. Flowchart of validation scheme for Integrative phenotyping framework for multiple omics data sets. Figure S7. An example of iPF that utilizes fused multiple data sets at the stage (vi). Figure S8. Examples of iPF using various combinations of the omics data sets (pooled analysis). Figure S9A. The gap statistics and its scree plot to choose the optimal number of clustering (clinical and miRNA data). Figure S9B. The gap statistics and its scree plot to choose the optimal number of clustering (mRNA and miRNA data). Figure S9C. The gap statistics and its scree plot to choose the optimal number of clustering (mRNA and clincal data). Figure S9D. The gap statistics and its scree plot to choose the optimal number of clustering (clincal data and combined data of mRNA and miRNA). Figure S10. The best choice of the number of feature modules. Figure S11. Simulation study shows robust true feature discovery in “Feature Fusion”. The x-axis represents multiplication levels of noise features. The y-axis represents average ARIs from 100 simulations. Each figure is generated based on simulation scenarios of the different number of true features (e.g., 200, 400, and 600, respectively). Figure S12. Immunomodulating drugs target overexpressed genes in module two. Table S13. The description of mRNA and miRNA lung disease data. Table S14. Various correlation types depending on variable attributes. Table S15. The demographic summary of clinical features in each sub-cluster. Table S16. Target gene enrichment analysis (via Fisher exact test) related to twelve. Table S17. Regression analysis on target miRNA features, and coefficient of determination significant miRNA features. Table S18. The top disease or functional annotations associated with genes in module two in Cluster E patients. Figure S19. Basic consensus clustering using only gene expression data. (DOCX 6398 kb

    Additional file 7: of The molecular landscape of premenopausal breast cancer

    No full text
    PARADIGM analysis in The Cancer Genome Atlas (TCGA). Table S1. Pathways detected by gene set enrichment analysis (GSEA) with input of gene expression and copy number variation data for the PARADIGM algorithm. The nine columns correspond to the pathway name, size of the pathway, Enrichment score (ES) score, Normalized enrichment score (NES) score, nominal p value, false discovery rate (FDR) q value, Family-wise error rate (FWER) p value, and leading edge (typical GSEA output). Table S2. Pathways detected by GSEA with input of gene expression, copy number variation and methylation data for the PARADIGM algorithm. The nine columns correspond to the pathway name, the size of pathway, ES score, NES score, nominal p value, FDR q value, FWER p value, and leading edge (typical GSEA output). (XLSX 88 kb

    Association between gene expression of three genes [TOP2A (A), DBI (B) and PMVK(C)] and drug response in ER positive and ER negative breast cell lines.

    No full text
    <p>The x-axis represents cell line drug response, represented as AUC value; higher AUC values are correlated with drug resistance, while low AUC values are correlated with drug sensitivity. The y-axis represents the expression of genes in cell lines.</p
    corecore