26 research outputs found
Biomolecular Events in Cancer Revealed by Attractor Metagenes
Mining gene expression profiles has proven valuable for identifying signatures serving as surrogates of cancer phenotypes. However, the similarities of such signatures across different cancer types have not been strong enough to conclude that they represent a universal biological mechanism shared among multiple cancer types. Here we present a computational method for generating signatures using an iterative process that converges to one of several precise attractors defining signatures representing biomolecular events, such as cell transdifferentiation or the presence of an amplicon. By analyzing rich gene expression datasets from different cancer types, we identified several such biomolecular events, some of which are universally present in all tested cancer types in nearly identical form. Although the method is unsupervised, we show that it often leads to attractors with strong phenotypic associations. We present several such multi-cancer attractors, focusing on three that are prominent and sharply defined in all cases: a mesenchymal transition attractor strongly associated with tumor stage, a mitotic chromosomal instability attractor strongly associated with tumor grade, and a lymphocyte-specific attractor
Attractor Metafeatures and Their Application in Biomolecular Data Analysis
This dissertation proposes a family of algorithms for deriving signatures of mutually associated features, to which we refer as attractor metafeatures, or simply attractors. Specifically, we present multi-cancer attractor derivation algorithms, identifying correlated features in signatures from multiple biological data sets in one analysis, as well as the groups of samples or cells that exclusively express these signatures. Our results demonstrate that these signatures can be used, in proper combinations, as biomarkers that predict a patientâs survival rate, based on the transcriptome of the tumor sample. They can also be used as features to analyze the composition of the tumor.
Through analyzing large data sets of 18 cancer types and three high-throughput platforms from The Cancer Genome Atlas (TCGA) PanCanAtlas Project and multiple single-cell RNA-seq data sets, we identified novel cancer attractor signatures and elucidated the identity of the cells that express these signatures. Using these signatures, we developed a prognostic biomarker for breast cancer called the Breast Cancer Attractor Metagenes (BCAM) biomarker as well as a software platform to analyze the tumor sample, called Analysis of the Single-Cell Omics for Tumor (ASCOT)
The Cure: Making a game of gene selection for breast cancer survival prediction
Motivation: Molecular signatures for predicting breast cancer prognosis could
greatly improve care through personalization of treatment. Computational
analyses of genome-wide expression datasets have identified such signatures,
but these signatures leave much to be desired in terms of accuracy,
reproducibility and biological interpretability. Methods that take advantage of
structured prior knowledge (e.g. protein interaction networks) show promise in
helping to define better signatures but most knowledge remains unstructured.
Crowdsourcing via scientific discovery games is an emerging methodology that
has the potential to tap into human intelligence at scales and in modes
previously unheard of. Here, we developed and evaluated a game called The Cure
on the task of gene selection for breast cancer survival prediction. Our
central hypothesis was that knowledge linking expression patterns of specific
genes to breast cancer outcomes could be captured from game players. We
envisioned capturing knowledge both from the players prior experience and from
their ability to interpret text related to candidate genes presented to them in
the context of the game.
Results: Between its launch in Sept. 2012 and Sept. 2013, The Cure attracted
more than 1,000 registered players who collectively played nearly 10,000 games.
Gene sets assembled through aggregation of the collected data clearly
demonstrated the accumulation of relevant expert knowledge. In terms of
predictive accuracy, these gene sets provided comparable performance to gene
sets generated using other methods including those used in commercial tests.
The Cure is available at http://genegames.org/cure
Computational Analysis of Biomolecular Data for Medical Applications from Bulk to Single-cell
High-throughput technologies have continuously driven the generation of different biomolecular data, including the genomics, epigenomics, transcriptomics, and other omics data in the last two decades. The developments and advances have revolutionized medical research. In this dissertation, a collection of computational analyses and tools, based on different types of biomolecular data with particular applications on human diseases are presented including 1) a cascade ensemble model based on the Dirichlet process mixture model for reconstructing tumor subclonality from tumor DNA sequencing data; 2) a meta-analysis of gene expression and DNA methylation data from prefrontal cortex samples of patients with neuropsychiatric disorders indicating a stress-related epigenetic mechanism; 3) 2DImpute, an imputation algorithm that is designed to alleviate the sparsity problem in single-cell RNA-sequencing data; and 4) a pan-cancer transformation from adipose-derived stromal cells to metastasis-associated fibroblasts revealed by single cell analysis
Circulation
Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome.K08 HL098361/HL/NHLBI NIH HHS/United StatesDP2 HL123228/DP/NCCDPHP CDC HHS/United StatesK08 HL093861/HL/NHLBI NIH HHS/United StatesU01 HL107440/HL/NHLBI NIH HHS/United StatesDP2 HL123228/HL/NHLBI NIH HHS/United States2018-03-01T00:00:00Z26572668PMC5831252vault:2743
Therapeutic target discovery using Boolean network attractors: improvements of kali
In a previous article, an algorithm for identifying therapeutic targets in
Boolean networks modeling pathological mechanisms was introduced. In the
present article, the improvements made on this algorithm, named kali, are
described. These improvements are i) the possibility to work on asynchronous
Boolean networks, ii) a finer assessment of therapeutic targets and iii) the
possibility to use multivalued logic. kali assumes that the attractors of a
dynamical system, such as a Boolean network, are associated with the phenotypes
of the modeled biological system. Given a logic-based model of pathological
mechanisms, kali searches for therapeutic targets able to reduce the
reachability of the attractors associated with pathological phenotypes, thus
reducing their likeliness. kali is illustrated on an example network and used
on a biological case study. The case study is a published logic-based model of
bladder tumorigenesis from which kali returns consistent results. However, like
any computational tool, kali can predict but can not replace human expertise:
it is a supporting tool for coping with the complexity of biological systems in
the field of drug discovery
Multimodal assessment of estrogen receptor mRNA profiles to quantify estrogen pathway activity in breast tumors
Background
Molecular markers have transformed our understanding of the heterogeneity of breast cancer and have allowed the identification of genomic profiles of estrogen receptor (ER)-α signaling. However, our understanding of the transcriptional profiles of ER signaling remains inadequate. Therefore, we sought to identify the genomic indicators of ER pathway activity that could supplement traditional immunohistochemical (IHC) assessments of ER status to better understand ER signaling in the breast tumors of individual patients.
Materials and Methods
We reduced ESR1 (gene encoding the ER-α protein) mRNA levels using small interfering RNA in ER+ MCF7 breast cancer cells and assayed for transcriptional changes using Affymetrix HG U133 Plus 2.0 arrays. We also compared 1034 ER+ and ERâ breast tumors from publicly available microarray data. The principal components of ER activity generated from these analyses and from other published estrogen signatures were compared with ESR1 expression, ER-α IHC, and patient survival.
Results
Genes differentially expressed in both analyses were associated with ER-α IHC and ESR1 mRNA expression. They were also significantly enriched for estrogen-driven molecular pathways associated with ESR1, cyclin D1 (CCND1), MYC (v-myc avian myelocytomatosis viral oncogene homolog), and NFKB (nuclear factor kappa B). Despite their differing constituent genes, the principal components generated from these new analyses and from previously published ER-associated gene lists were all associated with each other and with the survival of patients with breast cancer treated with endocrine therapies.
Conclusion
A biomarker of ER-α pathway activity, generated using ESR1-responsive mRNAs in MCF7 cells, when used alongside ER-α IHC and ESR1 mRNA expression, could provide a method for further stratification of patients and add insight into ER pathway activity in these patients
Predicting Outcomes of Hormone and Chemotherapy in the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) Study by Biochemically-inspired Machine Learning
Genomic aberrations and gene expression-defined subtypes in the large METABRIC patient cohort have been used to stratify and predict survival. The present study used normalized gene expression signatures of paclitaxel drug response to predict outcome for different survival times in METABRIC patients receiving hormone (HT) and, in some cases, chemotherapy (CT) agents. This machine learning method, which distinguishes sensitivity vs. resistance in breast cancer cell lines and validates predictions in patients; was also used to derive gene signatures of other HT (tamoxifen) and CT agents (methotrexate, epirubicin, doxorubicin, and 5-fluorouracil) used in METABRIC. Paclitaxel gene signatures exhibited the best performance, however the other agents also predicted survival with acceptable accuracies. A support vector machine (SVM) model of paclitaxel response containing genes ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2, SLCO1B3, TUBB1, TUBB4A, and TUBB4B was 78.6% accurate in predicting survival of 84 patients treated with both HT and CT (median survival â„ 4.4 yr). Accuracy was lower (73.4%) in 304 untreated patients. The performance of other machine learning approaches was also evaluated at different survival thresholds. Minimum redundancy maximum relevance feature selection of a paclitaxel-based SVM classifier based on expression of genes BCL2L1, BBC3, FGF2, FN1, and TWIST1 was 81.1% accurate in 53 CT patients. In addition, a random forest (RF) classifier using a gene signature ( ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2,SLCO1B3, TUBB1, TUBB4A, and TUBB4B) predicted \u3e3-year survival with 85.5% accuracy in 420 HT patients. A similar RF gene signature showed 82.7% accuracy in 504 patients treated with CT and/or HT. These results suggest that tumor gene expression signatures refined by machine learning techniques can be useful for predicting survival after drug therapies