4,133 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    A multi-gene signature predicts outcome in patients with pancreatic ductal adenocarcinoma.

    Get PDF
    © 2014 Haider et al.; licensee BioMed Central. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.Improved usage of the repertoires of pancreatic ductal adenocarcinoma (PDAC) profiles is crucially needed to guide the development of predictive and prognostic tools that could inform the selection of treatment options

    Network-based stratification of tumor mutations.

    Get PDF
    Many forms of cancer have multiple subtypes with different causes and clinical outcomes. Somatic tumor genome sequences provide a rich new source of data for uncovering these subtypes but have proven difficult to compare, as two tumors rarely share the same mutations. Here we introduce network-based stratification (NBS), a method to integrate somatic tumor genomes with gene networks. This approach allows for stratification of cancer into informative subtypes by clustering together patients with mutations in similar network regions. We demonstrate NBS in ovarian, uterine and lung cancer cohorts from The Cancer Genome Atlas. For each tissue, NBS identifies subtypes that are predictive of clinical outcomes such as patient survival, response to therapy or tumor histology. We identify network regions characteristic of each subtype and show how mutation-derived subtypes can be used to train an mRNA expression signature, which provides similar information in the absence of DNA sequence

    Isoform-level gene signature improves prognostic stratification and accurately classifies glioblastoma subtypes.

    Get PDF
    Molecular stratification of tumors is essential for developing personalized therapies. Although patient stratification strategies have been successful; computational methods to accurately translate the gene-signature from high-throughput platform to a clinically adaptable low-dimensional platform are currently lacking. Here, we describe PIGExClass (platform-independent isoform-level gene-expression based classification-system), a novel computational approach to derive and then transfer gene-signatures from one analytical platform to another. We applied PIGExClass to design a reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) based molecular-subtyping assay for glioblastoma multiforme (GBM), the most aggressive primary brain tumors. Unsupervised clustering of TCGA (the Cancer Genome Altas Consortium) GBM samples, based on isoform-level gene-expression profiles, recaptured the four known molecular subgroups but switched the subtype for 19% of the samples, resulting in significant (P = 0.0103) survival differences among the refined subgroups. PIGExClass derived four-class classifier, which requires only 121 transcript-variants, assigns GBM patients' molecular subtype with 92% accuracy. This classifier was translated to an RT-qPCR assay and validated in an independent cohort of 206 GBM samples. Our results demonstrate the efficacy of PIGExClass in the design of clinically adaptable molecular subtyping assay and have implications for developing robust diagnostic assays for cancer patient stratification

    Gene Expression Commons: an open platform for absolute gene expression profiling.

    Get PDF
    Gene expression profiling using microarrays has been limited to comparisons of gene expression between small numbers of samples within individual experiments. However, the unknown and variable sensitivities of each probeset have rendered the absolute expression of any given gene nearly impossible to estimate. We have overcome this limitation by using a very large number (>10,000) of varied microarray data as a common reference, so that statistical attributes of each probeset, such as the dynamic range and threshold between low and high expression, can be reliably discovered through meta-analysis. This strategy is implemented in a web-based platform named "Gene Expression Commons" (https://gexc.stanford.edu/) which contains data of 39 distinct highly purified mouse hematopoietic stem/progenitor/differentiated cell populations covering almost the entire hematopoietic system. Since the Gene Expression Commons is designed as an open platform, investigators can explore the expression level of any gene, search by expression patterns of interest, submit their own microarray data, and design their own working models representing biological relationship among samples

    Microarray meta-analysis database (M2DB): a uniformly pre-processed, quality controlled, and manually curated human clinical microarray database

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Over the past decade, gene expression microarray studies have greatly expanded our knowledge of genetic mechanisms of human diseases. Meta-analysis of substantial amounts of accumulated data, by integrating valuable information from multiple studies, is becoming more important in microarray research. However, collecting data of special interest from public microarray repositories often present major practical problems. Moreover, including low-quality data may significantly reduce meta-analysis efficiency.</p> <p>Results</p> <p>M<sup>2</sup>DB is a human curated microarray database designed for easy querying, based on clinical information and for interactive retrieval of either raw or uniformly pre-processed data, along with a set of quality-control metrics. The database contains more than 10,000 previously published Affymetrix GeneChip arrays, performed using human clinical specimens. M<sup>2</sup>DB allows online querying according to a flexible combination of five clinical annotations describing disease state and sampling location. These annotations were manually curated by controlled vocabularies, based on information obtained from GEO, ArrayExpress, and published papers. For array-based assessment control, the online query provides sets of QC metrics, generated using three available QC algorithms. Arrays with poor data quality can easily be excluded from the query interface. The query provides values from two algorithms for gene-based filtering, and raw data and three kinds of pre-processed data for downloading.</p> <p>Conclusion</p> <p>M<sup>2</sup>DB utilizes a user-friendly interface for QC parameters, sample clinical annotations, and data formats to help users obtain clinical metadata. This database provides a lower entry threshold and an integrated process of meta-analysis. We hope that this research will promote further evolution of microarray meta-analysis.</p

    Design of a multi-signature ensemble classifier predicting neuroblastoma patients' outcome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Neuroblastoma is the most common pediatric solid tumor of the sympathetic nervous system. Development of improved predictive tools for patients stratification is a crucial requirement for neuroblastoma therapy. Several studies utilized gene expression-based signatures to stratify neuroblastoma patients and demonstrated a clear advantage of adding genomic analysis to risk assessment. There is little overlapping among signatures and merging their prognostic potential would be advantageous. Here, we describe a new strategy to merge published neuroblastoma related gene signatures into a single, highly accurate, Multi-Signature Ensemble (MuSE)-classifier of neuroblastoma (NB) patients outcome.</p> <p>Methods</p> <p>Gene expression profiles of 182 neuroblastoma tumors, subdivided into three independent datasets, were used in the various phases of development and validation of neuroblastoma NB-MuSE-classifier. Thirty three signatures were evaluated for patients' outcome prediction using 22 classification algorithms each and generating 726 classifiers and prediction results. The best-performing algorithm for each signature was selected, validated on an independent dataset and the 20 signatures performing with an accuracy > = 80% were retained.</p> <p>Results</p> <p>We combined the 20 predictions associated to the corresponding signatures through the selection of the best performing algorithm into a single outcome predictor. The best performance was obtained by the Decision Table algorithm that produced the NB-MuSE-classifier characterized by an external validation accuracy of 94%. Kaplan-Meier curves and log-rank test demonstrated that patients with good and poor outcome prediction by the NB-MuSE-classifier have a significantly different survival (p < 0.0001). Survival curves constructed on subgroups of patients divided on the bases of known prognostic marker suggested an excellent stratification of localized and stage 4s tumors but more data are needed to prove this point.</p> <p>Conclusions</p> <p>The NB-MuSE-classifier is based on an ensemble approach that merges twenty heterogeneous, neuroblastoma-related gene signatures to blend their discriminating power, rather than numeric values, into a single, highly accurate patients' outcome predictor. The novelty of our approach derives from the way to integrate the gene expression signatures, by optimally associating them with a single paradigm ultimately integrated into a single classifier. This model can be exported to other types of cancer and to diseases for which dedicated databases exist.</p
    • …
    corecore