26 research outputs found

    Biomolecular Events in Cancer Revealed by Attractor Metagenes

    Get PDF
    Mining gene expression profiles has proven valuable for identifying signatures serving as surrogates of cancer phenotypes. However, the similarities of such signatures across different cancer types have not been strong enough to conclude that they represent a universal biological mechanism shared among multiple cancer types. Here we present a computational method for generating signatures using an iterative process that converges to one of several precise attractors defining signatures representing biomolecular events, such as cell transdifferentiation or the presence of an amplicon. By analyzing rich gene expression datasets from different cancer types, we identified several such biomolecular events, some of which are universally present in all tested cancer types in nearly identical form. Although the method is unsupervised, we show that it often leads to attractors with strong phenotypic associations. We present several such multi-cancer attractors, focusing on three that are prominent and sharply defined in all cases: a mesenchymal transition attractor strongly associated with tumor stage, a mitotic chromosomal instability attractor strongly associated with tumor grade, and a lymphocyte-specific attractor

    Attractor Metafeatures and Their Application in Biomolecular Data Analysis

    Get PDF
    This dissertation proposes a family of algorithms for deriving signatures of mutually associated features, to which we refer as attractor metafeatures, or simply attractors. Specifically, we present multi-cancer attractor derivation algorithms, identifying correlated features in signatures from multiple biological data sets in one analysis, as well as the groups of samples or cells that exclusively express these signatures. Our results demonstrate that these signatures can be used, in proper combinations, as biomarkers that predict a patient’s survival rate, based on the transcriptome of the tumor sample. They can also be used as features to analyze the composition of the tumor. Through analyzing large data sets of 18 cancer types and three high-throughput platforms from The Cancer Genome Atlas (TCGA) PanCanAtlas Project and multiple single-cell RNA-seq data sets, we identified novel cancer attractor signatures and elucidated the identity of the cells that express these signatures. Using these signatures, we developed a prognostic biomarker for breast cancer called the Breast Cancer Attractor Metagenes (BCAM) biomarker as well as a software platform to analyze the tumor sample, called Analysis of the Single-Cell Omics for Tumor (ASCOT)

    The Cure: Making a game of gene selection for breast cancer survival prediction

    Get PDF
    Motivation: Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility and biological interpretability. Methods that take advantage of structured prior knowledge (e.g. protein interaction networks) show promise in helping to define better signatures but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes previously unheard of. Here, we developed and evaluated a game called The Cure on the task of gene selection for breast cancer survival prediction. Our central hypothesis was that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from game players. We envisioned capturing knowledge both from the players prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game. Results: Between its launch in Sept. 2012 and Sept. 2013, The Cure attracted more than 1,000 registered players who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data clearly demonstrated the accumulation of relevant expert knowledge. In terms of predictive accuracy, these gene sets provided comparable performance to gene sets generated using other methods including those used in commercial tests. The Cure is available at http://genegames.org/cure

    Computational Analysis of Biomolecular Data for Medical Applications from Bulk to Single-cell

    Get PDF
    High-throughput technologies have continuously driven the generation of different biomolecular data, including the genomics, epigenomics, transcriptomics, and other omics data in the last two decades. The developments and advances have revolutionized medical research. In this dissertation, a collection of computational analyses and tools, based on different types of biomolecular data with particular applications on human diseases are presented including 1) a cascade ensemble model based on the Dirichlet process mixture model for reconstructing tumor subclonality from tumor DNA sequencing data; 2) a meta-analysis of gene expression and DNA methylation data from prefrontal cortex samples of patients with neuropsychiatric disorders indicating a stress-related epigenetic mechanism; 3) 2DImpute, an imputation algorithm that is designed to alleviate the sparsity problem in single-cell RNA-sequencing data; and 4) a pan-cancer transformation from adipose-derived stromal cells to metastasis-associated fibroblasts revealed by single cell analysis

    Circulation

    Get PDF
    Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome.K08 HL098361/HL/NHLBI NIH HHS/United StatesDP2 HL123228/DP/NCCDPHP CDC HHS/United StatesK08 HL093861/HL/NHLBI NIH HHS/United StatesU01 HL107440/HL/NHLBI NIH HHS/United StatesDP2 HL123228/HL/NHLBI NIH HHS/United States2018-03-01T00:00:00Z26572668PMC5831252vault:2743

    Therapeutic target discovery using Boolean network attractors: improvements of kali

    Full text link
    In a previous article, an algorithm for identifying therapeutic targets in Boolean networks modeling pathological mechanisms was introduced. In the present article, the improvements made on this algorithm, named kali, are described. These improvements are i) the possibility to work on asynchronous Boolean networks, ii) a finer assessment of therapeutic targets and iii) the possibility to use multivalued logic. kali assumes that the attractors of a dynamical system, such as a Boolean network, are associated with the phenotypes of the modeled biological system. Given a logic-based model of pathological mechanisms, kali searches for therapeutic targets able to reduce the reachability of the attractors associated with pathological phenotypes, thus reducing their likeliness. kali is illustrated on an example network and used on a biological case study. The case study is a published logic-based model of bladder tumorigenesis from which kali returns consistent results. However, like any computational tool, kali can predict but can not replace human expertise: it is a supporting tool for coping with the complexity of biological systems in the field of drug discovery

    Multimodal assessment of estrogen receptor mRNA profiles to quantify estrogen pathway activity in breast tumors

    Get PDF
    Background Molecular markers have transformed our understanding of the heterogeneity of breast cancer and have allowed the identification of genomic profiles of estrogen receptor (ER)-α signaling. However, our understanding of the transcriptional profiles of ER signaling remains inadequate. Therefore, we sought to identify the genomic indicators of ER pathway activity that could supplement traditional immunohistochemical (IHC) assessments of ER status to better understand ER signaling in the breast tumors of individual patients. Materials and Methods We reduced ESR1 (gene encoding the ER-α protein) mRNA levels using small interfering RNA in ER+ MCF7 breast cancer cells and assayed for transcriptional changes using Affymetrix HG U133 Plus 2.0 arrays. We also compared 1034 ER+ and ER− breast tumors from publicly available microarray data. The principal components of ER activity generated from these analyses and from other published estrogen signatures were compared with ESR1 expression, ER-α IHC, and patient survival. Results Genes differentially expressed in both analyses were associated with ER-α IHC and ESR1 mRNA expression. They were also significantly enriched for estrogen-driven molecular pathways associated with ESR1, cyclin D1 (CCND1), MYC (v-myc avian myelocytomatosis viral oncogene homolog), and NFKB (nuclear factor kappa B). Despite their differing constituent genes, the principal components generated from these new analyses and from previously published ER-associated gene lists were all associated with each other and with the survival of patients with breast cancer treated with endocrine therapies. Conclusion A biomarker of ER-α pathway activity, generated using ESR1-responsive mRNAs in MCF7 cells, when used alongside ER-α IHC and ESR1 mRNA expression, could provide a method for further stratification of patients and add insight into ER pathway activity in these patients

    Predicting Outcomes of Hormone and Chemotherapy in the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) Study by Biochemically-inspired Machine Learning

    Get PDF
    Genomic aberrations and gene expression-defined subtypes in the large METABRIC patient cohort have been used to stratify and predict survival. The present study used normalized gene expression signatures of paclitaxel drug response to predict outcome for different survival times in METABRIC patients receiving hormone (HT) and, in some cases, chemotherapy (CT) agents. This machine learning method, which distinguishes sensitivity vs. resistance in breast cancer cell lines and validates predictions in patients; was also used to derive gene signatures of other HT (tamoxifen) and CT agents (methotrexate, epirubicin, doxorubicin, and 5-fluorouracil) used in METABRIC. Paclitaxel gene signatures exhibited the best performance, however the other agents also predicted survival with acceptable accuracies. A support vector machine (SVM) model of paclitaxel response containing genes ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2, SLCO1B3, TUBB1, TUBB4A, and TUBB4B was 78.6% accurate in predicting survival of 84 patients treated with both HT and CT (median survival ≄ 4.4 yr). Accuracy was lower (73.4%) in 304 untreated patients. The performance of other machine learning approaches was also evaluated at different survival thresholds. Minimum redundancy maximum relevance feature selection of a paclitaxel-based SVM classifier based on expression of genes BCL2L1, BBC3, FGF2, FN1, and TWIST1 was 81.1% accurate in 53 CT patients. In addition, a random forest (RF) classifier using a gene signature ( ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2,SLCO1B3, TUBB1, TUBB4A, and TUBB4B) predicted \u3e3-year survival with 85.5% accuracy in 420 HT patients. A similar RF gene signature showed 82.7% accuracy in 504 patients treated with CT and/or HT. These results suggest that tumor gene expression signatures refined by machine learning techniques can be useful for predicting survival after drug therapies
    corecore