47 research outputs found

    Statistical integrative omics methods for disease subtype discovery

    Get PDF
    Disease phenotyping using omics data has become a popular approach that can poten-tially lead to better personalized treatment. Identifying disease subtypes via unsupervised machine learning is the first step towards this goal. With the accumulation of massive high-throughput omics data sets, omics data integration becomes essential to improve statistical power and reproducibility. In this dissertation, two directions from sparse K-means method will be extended. The first extension is a meta-analytic framework to identify novel disease subtypes when expression profiles from multiple cohorts are available. The lasso regularization and meta-analysis can identify a unique set of gene features for subtype characterization. By adding pattern matching reward function, consistency of subtype signatures across studies can be achieved. The second extension is using integrating multi-level omics datasets by incorporating prior biological knowledge using sparse overlapping group lasso approach. An algorithm using alternating direction method of multiplier (ADMM) will be applied for fast optimization. For both topics, simulation and real applications in breast cancer and leukemia will show the superior clustering accuracy, feature selection and functional annotation. These methods will improved statistical power, prediction accuracy and reproducibility of disease subtype discovery analysis. Contribution to public health: The proposed methods are able to identify disease subtypes from complex multi-level or multi-cohort omics data. Disease subtype definition is essential to deliver personalized medicine, since treating different subtypes by its most appropriate medicine will achieve the most effective treatment effect and eliminate side effect. Omics data itself can provide better definition of disease subtypes than regular pathological approaches. By multi-level or multi-cohort omics data, we are able to gain statistical power and reproducibility, and the resulting subtype definition is much reliable, convincing and reproducible than single study analysis

    Disrupted circadian oscillations in type 2 diabetes are linked to altered rhythmic mitochondrial metabolism in skeletal muscle

    Get PDF
    Funding: The authors are supported by grants from the AstraZeneca SciLifeLab Research Programme, Novo Nordisk Foundation (NNF14OC0011493, and NNF17OC0030088), Swedish Diabetes Foundation (DIA2018-357), Swedish Research Council (2015-00165 and 2018-02389), the Knut and Alice Wallenberg Foundation (2018-0094), the Strategic Research Programme in Diabetes at Karolinska Institutet (2009-1068), the Stockholm County Council (SLL20170159), and the Swedish Research Council for Sport Science (P2019-0140). B.M.G. was supported by fellowships from the Novo Nordisk Foundation (NNF19OC0055072), the Wenner-Gren Foundation, an Albert Renold Travel Fellowship from the European Foundation for the Study of Diabetes, and an Eric Reid Fund for Methodology from the Biochemical Society. N.J.P. and L.S.-P. were supported by an Individual Fellowship from the Marie Skłodowska-Curie Actions (European Commission: 704978 and 675610). X.Z. and K.A.E. were supported by NIH R01AR066082. N.J.P. was supported by grants from the Sigurd och Elsa Goljes Minne and Lars Hierta Memorial Foundations (Sweden). We acknowledge the Beta Cell in-vivo Imaging/Extracellular Flux Analysis core facility supported by the Strategic Research Program in Diabetes for the usage of the Seahorse flux analyzer. Additional support was received from the Novo Nordisk Foundation Center for Basic Metabolic Research at the University of Copenhagen (NNF18CC0034900). The Novo Nordisk Foundation Center for Basic Metabolic Research is an independent research center at the University of Copenhagen, partially funded by an unrestricted donation from the Novo Nordisk Foundation. We acknowledge the Single-Cell Omics platform at the Novo Nordisk Foundation Center for Basic Metabolic Research for technical and computational expertise and support. Schematics are created with BioRender.com.Peer reviewedPublisher PD

    Metabolic Profiling of Cognitive Aging in Midlife

    No full text

    HCMMCNVs: hierarchical clustering mixture model of copy number variants detection using whole exome sequencing technology

    No full text
    A Summary: In this article, we introduce a hierarchical clustering and Gaussian mixture model with expectation-maximization (EM) algorithm for detecting copy number variants (CNVs) using whole exome sequencing (WES) data. The R shiny package 'HCMMCNVs' is also developed for processing user-provided bam files, running CNVs detection algorithm and conducting visualization. Through applying our approach to 325 cancer cell lines in 22 tumor types from Cancer Cell Line Encyclopedia (CCLE), we show that our algorithm is competitive with other existing methods and feasible in using multiple cancer cell lines for CNVs estimation. In addition, by applying our approach to WES data of 120 oral squamous cell carcinoma (OSCC) samples, our algorithm, using the tumor sample only, exhibits more power in detecting CNVs as compared with the methods using both tumors and matched normal counterparts

    Meta-Analytic Framework for Sparse <i>K</i>-Means to Identify Disease Subtypes in Multiple Transcriptomic Studies

    No full text
    <p>Disease phenotyping by omics data has become a popular approach that potentially can lead to better personalized treatment. Identifying disease subtypes via unsupervised machine learning is the first step toward this goal. In this article, we extend a sparse <i>K</i>-means method toward a meta-analytic framework to identify novel disease subtypes when expression profiles of multiple cohorts are available. The lasso regularization and meta-analysis identify a unique set of gene features for subtype characterization. An additional pattern matching reward function guarantees consistent subtype signatures across studies. The method was evaluated by simulations and leukemia and breast cancer datasets. The identified disease subtypes from meta-analysis were characterized with improved accuracy and stability compared to single study analysis. The breast cancer model was applied to an independent METABRIC dataset and generated improved survival difference between subtypes. These results provide a basis for diagnosis and development of targeted treatments for disease subgroups. Supplementary materials for this article are available online.</p
    corecore