Search CORE

4,133 research outputs found

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Directory of Open Access Journals

eScholarship - University of California

A multi-gene signature predicts outcome in patients with pancreatic ductal adenocarcinoma.

Author: A Jemal
A Stathis
Ai Nagano
Aldo Scarpa
Ami Desai
AV Biankin
C Curtis
C Winter
Cancer Genome Atlas Research Network
CGA Network
Claude Chelala
CQ Zhu
D Cox
EA Collisson
F Maione
FM Buffa
G Zhang
H Huang
H Janouskova
H Pei
Hemant M Kocher
HH Wong
I Kovacevic
J Adachi
JA Tercero
JD Mosley
JK Stratford
John F Marshall
Jude Fitzgibbon
Jun Wang
K Yoshihara
L Badea
Laurent Dumartin
M Hidalgo
M Taniwaki
MA Tempero
MF Brennan
MM Al-Hawary
MW Muller
N Wasif
NF Li
Nicholas R Lemoine
NL Anderson
PA Perez-Mancera
PC Boutros
PG Febbo
PJ Campbell
Prabhu Arumugam
R Grutzmann
R Grutzmann
RR Plentz
S Jones
SJ Coleman
SV Srinivasan
Syed Haider
T Barrett
Tatjana Crnogorac-Jurcevic
Thorsten Hagemann
TR Donahue
VH Coupland
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

© 2014 Haider et al.; licensee BioMed Central. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.Improved usage of the repertoires of pancreatic ductal adenocarcinoma (PDAC) profiles is crucially needed to guide the development of predictive and prognostic tools that could inform the selection of treatment options

Springer - Publisher Connector

Network-based stratification of tumor mutations.

Author: Carter Hannah
Gross Andrew
Hofree Matan
Ideker Trey
Shen John P
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

Many forms of cancer have multiple subtypes with different causes and clinical outcomes. Somatic tumor genome sequences provide a rich new source of data for uncovering these subtypes but have proven difficult to compare, as two tumors rarely share the same mutations. Here we introduce network-based stratification (NBS), a method to integrate somatic tumor genomes with gene networks. This approach allows for stratification of cancer into informative subtypes by clustering together patients with mutations in similar network regions. We demonstrate NBS in ovarian, uterine and lung cancer cohorts from The Cancer Genome Atlas. For each tissue, NBS identifies subtypes that are predictive of clinical outcomes such as patient survival, response to therapy or tumor histology. We identify network regions characteristic of each subtype and show how mutation-derived subtypes can be used to train an mRNA expression signature, which provides similar information in the absence of DNA sequence

CiteSeerX

eScholarship - University of California

Isoform-level gene signature improves prognostic stratification and accurately classifies glioblastoma subtypes.

Author: Bi Yingtao
Davuluri Ramana V
Macyszyn Luke
O'Rourke Donald M
Pal Sharmistha
Showe Louise C
Publication venue: eScholarship, University of California
Publication date: 06/02/2014
Field of study

Molecular stratification of tumors is essential for developing personalized therapies. Although patient stratification strategies have been successful; computational methods to accurately translate the gene-signature from high-throughput platform to a clinically adaptable low-dimensional platform are currently lacking. Here, we describe PIGExClass (platform-independent isoform-level gene-expression based classification-system), a novel computational approach to derive and then transfer gene-signatures from one analytical platform to another. We applied PIGExClass to design a reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) based molecular-subtyping assay for glioblastoma multiforme (GBM), the most aggressive primary brain tumors. Unsupervised clustering of TCGA (the Cancer Genome Altas Consortium) GBM samples, based on isoform-level gene-expression profiles, recaptured the four known molecular subgroups but switched the subtype for 19% of the samples, resulting in significant (P = 0.0103) survival differences among the refined subgroups. PIGExClass derived four-class classifier, which requires only 121 transcript-variants, assigns GBM patients' molecular subtype with 92% accuracy. This classifier was translated to an RT-qPCR assay and validated in an independent cohort of 206 GBM samples. Our results demonstrate the efficacy of PIGExClass in the design of clinically adaptable molecular subtyping assay and have implications for developing robust diagnostic assays for cancer patient stratification

eScholarship - University of California

Gene Expression Commons: an open platform for absolute gene expression profiling.

Author: Bhattacharya Deepta
Dill David L
Ehrlich Lauren IR
Fathman John W
Inlay Matthew A
Rossi Derrick J
Sahoo Debashis
Seita Jun
Serwold Thomas
Weissman Irving L
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

Gene expression profiling using microarrays has been limited to comparisons of gene expression between small numbers of samples within individual experiments. However, the unknown and variable sensitivities of each probeset have rendered the absolute expression of any given gene nearly impossible to estimate. We have overcome this limitation by using a very large number (>10,000) of varied microarray data as a common reference, so that statistical attributes of each probeset, such as the dynamic range and threshold between low and high expression, can be reliably discovered through meta-analysis. This strategy is implemented in a web-based platform named "Gene Expression Commons" (https://gexc.stanford.edu/) which contains data of 39 distinct highly purified mouse hematopoietic stem/progenitor/differentiated cell populations covering almost the entire hematopoietic system. Since the Gene Expression Commons is designed as an open platform, investigators can explore the expression level of any gene, search by expression patterns of interest, submit their own microarray data, and design their own working models representing biological relationship among samples

Directory of Open Access Journals

eScholarship - University of California

Microarray meta-analysis database (M2DB): a uniformly pre-processed, quality controlled, and manually curated human clinical microarray database

Author: Chang Cheng-Wei
Chen Chaang-Ray
Cheng Wei-Chung
Hong Ji-Hong
Hsu Ian C
Huang Ching-Lung
Lee Yun-Shien
Li Chia-Yang
Shu Wun-Yi
Tsai Min-Lung
Wang Tzu-Hao
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Over the past decade, gene expression microarray studies have greatly expanded our knowledge of genetic mechanisms of human diseases. Meta-analysis of substantial amounts of accumulated data, by integrating valuable information from multiple studies, is becoming more important in microarray research. However, collecting data of special interest from public microarray repositories often present major practical problems. Moreover, including low-quality data may significantly reduce meta-analysis efficiency. Results M2DB is a human curated microarray database designed for easy querying, based on clinical information and for interactive retrieval of either raw or uniformly pre-processed data, along with a set of quality-control metrics. The database contains more than 10,000 previously published Affymetrix GeneChip arrays, performed using human clinical specimens. M2DB allows online querying according to a flexible combination of five clinical annotations describing disease state and sampling location. These annotations were manually curated by controlled vocabularies, based on information obtained from GEO, ArrayExpress, and published papers. For array-based assessment control, the online query provides sets of QC metrics, generated using three available QC algorithms. Arrays with poor data quality can easily be excluded from the query interface. The query provides values from two algorithms for gene-based filtering, and raw data and three kinds of pre-processed data for downloading. Conclusion M2DB utilizes a user-friendly interface for QC parameters, sample clinical annotations, and data formats to help users obtain clinical metadata. This database provides a lower entry threshold and an integrated process of meta-analysis. We hope that this research will promote further evolution of microarray meta-analysis.</p

Springer - Publisher Connector

Directory of Open Access Journals

Design of a multi-signature ensemble classifier predicting neuroblastoma patients' outcome

Author: Acquaviva Massimo
Barzaghi Sara
Blengio Fabiola
Bosco Maria Carla
Cornero Andrea
Eva Alessandra
Fardin Paolo
Schramm Alexander
Varesio Luigi
Versteeg Rogier
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Neuroblastoma is the most common pediatric solid tumor of the sympathetic nervous system. Development of improved predictive tools for patients stratification is a crucial requirement for neuroblastoma therapy. Several studies utilized gene expression-based signatures to stratify neuroblastoma patients and demonstrated a clear advantage of adding genomic analysis to risk assessment. There is little overlapping among signatures and merging their prognostic potential would be advantageous. Here, we describe a new strategy to merge published neuroblastoma related gene signatures into a single, highly accurate, Multi-Signature Ensemble (MuSE)-classifier of neuroblastoma (NB) patients outcome. Methods Gene expression profiles of 182 neuroblastoma tumors, subdivided into three independent datasets, were used in the various phases of development and validation of neuroblastoma NB-MuSE-classifier. Thirty three signatures were evaluated for patients' outcome prediction using 22 classification algorithms each and generating 726 classifiers and prediction results. The best-performing algorithm for each signature was selected, validated on an independent dataset and the 20 signatures performing with an accuracy > = 80% were retained. Results We combined the 20 predictions associated to the corresponding signatures through the selection of the best performing algorithm into a single outcome predictor. The best performance was obtained by the Decision Table algorithm that produced the NB-MuSE-classifier characterized by an external validation accuracy of 94%. Kaplan-Meier curves and log-rank test demonstrated that patients with good and poor outcome prediction by the NB-MuSE-classifier have a significantly different survival (p < 0.0001). Survival curves constructed on subgroups of patients divided on the bases of known prognostic marker suggested an excellent stratification of localized and stage 4s tumors but more data are needed to prove this point. Conclusions The NB-MuSE-classifier is based on an ensemble approach that merges twenty heterogeneous, neuroblastoma-related gene signatures to blend their discriminating power, rather than numeric values, into a single, highly accurate patients' outcome predictor. The novelty of our approach derives from the way to integrate the gene expression signatures, by optimally associating them with a single paradigm ultimately integrated into a single classifier. This model can be exported to other types of cancer and to diseases for which dedicated databases exist.</p

Springer - Publisher Connector

Directory of Open Access Journals