507 research outputs found
Recommended from our members
Integrative analysis of the inter-tumoral heterogeneity of triple-negative breast cancer.
Triple-negative breast cancers (TNBC) lack estrogen and progesterone receptors and HER2 amplification, and are resistant to therapies that target these receptors. Tumors from TNBC patients are heterogeneous based on genetic variations, tumor histology, and clinical outcomes. We used high throughput genomic data for TNBC patients (n = 137) from TCGA to characterize inter-tumor heterogeneity. Similarity network fusion (SNF)-based integrative clustering combining gene expression, miRNA expression, and copy number variation, revealed three distinct patient clusters. Integrating multiple types of data resulted in more distinct clusters than analyses with a single datatype. Whereas most TNBCs are classified by PAM50 as basal subtype, one of the clusters was enriched in the non-basal PAM50 subtypes, exhibited more aggressive clinical features and had a distinctive signature of oncogenic mutations, miRNAs and expressed genes. Our analyses provide a new classification scheme for TNBC based on multiple omics datasets and provide insight into molecular features that underlie TNBC heterogeneity
Recommended from our members
Integration of Genome Scale Data for Identifying New Biomarkers in Colon Cancer: Integrated Analysis of Transcriptomics and Epigenomics Data from High Throughput Technologies in Order to Identifying New Biomarkers Genes for Personalised Targeted Therapies for Patients Suffering from Colon Cancer
Colorectal cancer is the third most common cancer and the leading cause of cancer deaths in Western industrialised countries. Despite recent advances in the screening, diagnosis, and treatment of colorectal cancer, an estimated 608,000 people die every year due to colon cancer. Our current knowledge of colorectal carcinogenesis indicates a multifactorial and multi-step process that involves various genetic alterations and several biological pathways. The identification of molecular markers with early diagnostic and precise clinical outcome in colon cancer is a challenging task because of tumour heterogeneity.
This Ph.D.-thesis presents the molecular and cellular mechanisms leading to colorectal cancer. A systematical review of the literature is conducted on Microarray Gene expression profiling, gene ontology enrichment analysis, microRNA and system Biology and various bioinformatics tools.
We aimed this study to stratify a colon tumour into molecular distinct subtypes, identification of novel diagnostic targets and prediction of reliable prognostic signatures for clinical practice using microarray expression datasets. We performed an integrated analysis of gene expression data based on genetic, epigenetic and extensive clinical information using unsupervised learning, correlation and functional network analysis. As results, we identified 267-gene and 124-gene signatures that can distinguish normal, primary and metastatic tissues, and also involved in important regulatory functions such as immune-response, lipid metabolism and peroxisome proliferator-activated receptors (PPARs) signalling pathways.
For the first time, we also identify miRNAs that can differentiate between primary colon from metastatic and a prognostic signature of grade and stage levels, which can be a major contributor to complex transcriptional phenotypes in a colon tumour
k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction
In the clinical application of genomic data analysis and modeling, a number of factors contribute to the performance of disease classification and clinical outcome prediction. This study focuses on the k-nearest neighbor (KNN) modeling strategy and its clinical use. Although KNN is simple and clinically appealing, large performance variations were found among experienced data analysis teams in the MicroArray Quality Control Phase II (MAQC-II) project. For clinical end points and controls from breast cancer, neuroblastoma and multiple myeloma, we systematically generated 463 320 KNN models by varying feature ranking method, number of features, distance metric, number of neighbors, vote weighting and decision threshold. We identified factors that contribute to the MAQC-II project performance variation, and validated a KNN data analysis protocol using a newly generated clinical data set with 478 neuroblastoma patients. We interpreted the biological and practical significance of the derived KNN models, and compared their performance with existing clinical factors
Type 2 Diabetes Mellitus and its comorbidity, Alzheimer’s disease: Identifying critical microRNA using machine learning
MicroRNAs (miRNAs) are critical regulators of gene expression in healthy and diseased states, and numerous studies have established their tremendous potential as a tool for improving the diagnosis of Type 2 Diabetes Mellitus (T2D) and its comorbidities. In this regard, we computationally identify novel top-ranked hub miRNAs that might be involved in T2D. We accomplish this via two strategies: 1) by ranking miRNAs based on the number of T2D differentially expressed genes (DEGs) they target, and 2) using only the common DEGs between T2D and its comorbidity, Alzheimer’s disease (AD) to predict and rank miRNA. Then classifier models are built using the DEGs targeted by each miRNA as features. Here, we show the T2D DEGs targeted by hsa-mir-1-3p, hsa-mir-16-5p, hsa-mir-124-3p, hsa-mir-34a-5p, hsa-let-7b-5p, hsa-mir-155-5p, hsa-mir-107, hsa-mir-27a-3p, hsa-mir-129-2-3p, and hsa-mir-146a-5p are capable of distinguishing T2D samples from the controls, which serves as a measure of confidence in the miRNAs’ potential role in T2D progression. Moreover, for the second strategy, we show other critical miRNAs can be made apparent through the disease’s comorbidities, and in this case, overall, the hsa-mir-103a-3p models work well for all the datasets, especially in T2D, while the hsa-mir-124-3p models achieved the best scores for the AD datasets. To the best of our knowledge, this is the first study that used predicted miRNAs to determine the features that can separate the diseased samples (T2D or AD) from the normal ones, instead of using conventional non-biology-based feature selection methods
MicroRNA dysregulation and esophageal cancer development depend on the extent of zinc dietary deficiency
open9siopenFong, Louise Y.; Taccioli, Cristian; Jing, Ruiyan; Smalley, Karl J.; Alder, Hansjuerg; Jiang, Yubao; Fadda, Paolo; Farber, John L.; Croce, Carlo M.Fong, Louise Y.; Taccioli, Cristian; Jing, Ruiyan; Smalley, Karl J.; Alder, Hansjuerg; Jiang, Yubao; Fadda, Paolo; Farber, John L.; Croce, Carlo M
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Analytical variables influencing the performance of a miRNA based laboratory assay for prediction of relapse in stage I non-small cell lung cancer (NSCLC)
<p>Abstract</p> <p>Background</p> <p>Laboratory assays are needed for early stage non-small lung cancer (NSCLC) that can link molecular and clinical heterogeneity to predict relapse after surgical resection. We technically validated two miRNA assays for prediction of relapse in NSCLC. Total RNA from seventy-five formalin-fixed and paraffin-embedded (FFPE) specimens was extracted, labeled and hybridized to Affymetrix miRNA arrays using different RNA input amounts, ATP-mix dilutions, array lots and RNA extraction- and labeling methods in a total of 166 hybridizations. Two combinations of RNA extraction- and labeling methods (assays I and II) were applied to a cohort of 68 early stage NSCLC patients.</p> <p>Results</p> <p>RNA input amount and RNA extraction- and labeling methods affected signal intensity and the number of detected probes and probe sets, and caused large variation, whereas different ATP-mix dilutions and array lots did not. Leave-one-out accuracies for prediction of relapse were 63% and 73% for the two assays. Prognosticator calls ("no recurrence" or "recurrence") were consistent, independent on RNA amount, ATP-mix dilution, array lots and RNA extraction method. The calls were not robust to changes in labeling method.</p> <p>Conclusions</p> <p>In this study, we demonstrate that some analytical conditions such as RNA extraction- and labeling methods are important for the variation in assay performance whereas others are not. Thus, careful optimization that address all analytical steps and variables can improve the accuracy of prediction and facilitate the introduction of microRNA arrays in the clinic for prediction of relapse in stage I non-small cell lung cancer (NSCLC).</p
Network-based stratification of tumor mutations.
Many forms of cancer have multiple subtypes with different causes and clinical outcomes. Somatic tumor genome sequences provide a rich new source of data for uncovering these subtypes but have proven difficult to compare, as two tumors rarely share the same mutations. Here we introduce network-based stratification (NBS), a method to integrate somatic tumor genomes with gene networks. This approach allows for stratification of cancer into informative subtypes by clustering together patients with mutations in similar network regions. We demonstrate NBS in ovarian, uterine and lung cancer cohorts from The Cancer Genome Atlas. For each tissue, NBS identifies subtypes that are predictive of clinical outcomes such as patient survival, response to therapy or tumor histology. We identify network regions characteristic of each subtype and show how mutation-derived subtypes can be used to train an mRNA expression signature, which provides similar information in the absence of DNA sequence
Advantages of genomic complexity: bioinformatics opportunities in microRNA cancer signatures
MicroRNAs, small non-coding RNAs, may act as tumor suppressors or oncogenes, and each regulate their own transcription and that of hundreds of genes, often in a tissue-dependent manner. This creates a tightly interwoven network regulating and underlying oncogenesis and cancer biology. Although protein-coding gene signatures and single protein pathway markers have proliferated over the past decade, routine adoption of the former has been hampered by interpretability, reproducibility, and dimensionality, whereas the single molecule–phenotype reductionism of the latter is often overly simplistic to account for complex phenotypes. MicroRNA-derived biomarkers offer a powerful alternative; they have both the flexibility of gene expression signature classifiers and the desirable mechanistic transparency of single protein biomarkers. Furthermore, several advances have recently demonstrated the robust detection of microRNAs from various biofluids, thus providing an additional opportunity for obtaining bioinformatically derived biomarkers to accelerate the identification of individual patients for personalized therapy
- …