197 research outputs found

    Machine Learning-Based Ensemble Recursive Feature Selection of Circulating miRNAs for Cancer Tumor Classification

    Get PDF
    Lopez-Rincon A, Mendoza-Maldonado L, Martinez-Archundia M, et al. Machine Learning-Based Ensemble Recursive Feature Selection of Circulating miRNAs for Cancer Tumor Classification. Cancers. 2020;12(7): 1785.Circulating microRNAs (miRNA) are small noncoding RNA molecules that can be detected in bodily fluids without the need for major invasive procedures on patients. miRNAs have shown great promise as biomarkers for tumors to both assess their presence and to predict their type and subtype. Recently, thanks to the availability of miRNAs datasets, machine learning techniques have been successfully applied to tumor classification. The results, however, are difficult to assess and interpret by medical experts because the algorithms exploit information from thousands of miRNAs. In this work, we propose a novel technique that aims at reducing the necessary information to the smallest possible set of circulating miRNAs. The dimensionality reduction achieved reflects a very important first step in a potential, clinically actionable, circulating miRNA-based precision medicine pipeline. While it is currently under discussion whether this first step can be taken, we demonstrate here that it is possible to perform classification tasks by exploiting a recursive feature elimination procedure that integrates a heterogeneous ensemble of high-quality, state-of-the-art classifiers on circulating miRNAs. Heterogeneous ensembles can compensate inherent biases of classifiers by using different classification algorithms. Selecting features then further eliminates biases emerging from using data from different studies or batches, yielding more robust and reliable outcomes. The proposed approach is first tested on a tumor classification problem in order to separate 10 different types of cancer, with samples collected over 10 different clinical trials, and later is assessed on a cancer subtype classification task, with the aim to distinguish triple negative breast cancer from other subtypes of breast cancer. Overall, the presented methodology proves to be effective and compares favorably to other state-of-the-art feature selection methods

    Multi-omics data integration for the detection and characterization of smoking related lung diseases

    Get PDF
    Lung cancer is the leading cause of death from cancer in the world. First, we hypothesized that microRNA expression is altered in the bronchial epithelium of patients with lung cancer and that incorporating microRNA expression into an existing mRNA biomarker may improve its performance. Using bronchial brushings collected from current and former smokers, we profiled microRNA expression via small RNA sequencing for 347 patients with available mRNA data. We found that four microRNAs were under-expressed in cancer patients compared to controls (p<0.002, FDR<0.2). We explored the role of these microRNAs and their gene targets in cancer. In addition, we found that adding a microRNA feature to an existing 23-gene biomarker significantly improves its performance (AUC) in a test set (p<0.05). Next, we generalized the biomarker discovery process, and developed a visualization tool for biomarker selection. We built upon an existing biomarker discovery pipeline and created a web-based interface to visualize the performance of multiple predictors. The “visualization” component is the key to sorting through a thousand potential biomarkers, and developing clinically useful molecular predictors. Finally, we explored the molecular events leading to the development of COPD and ILD, two heterogeneous diseases with high mortality. We hypothesized that integrative genetic and expression networks can help identify drivers and elucidate mechanisms of genetic susceptibility. We utilized 262 lung tissue specimens profiled with microRNA sequencing, microarray gene expression and SNP chip genotyping. Next, we built condition specific integrative networks using a causality inference test for predicting SNP-microRNA-mRNA associations, where the microRNA is a predicted mediator of the SNP’s effect on gene expression. We identified the microRNAs predicted to affect the most genes within each network. Members of miR-34/449 family, known to promote airway differentiation by repressing the Notch pathway, were among the top ranked microRNAs in COPD and ILD networks, but not in the non-disease network. In addition, the miR-34/449 gene module was enriched among genes that increase in expression over time when airway basal cells are differentiated at an air-liquid interface and among genes that increase in expression with the airway wall thickening in patients with emphysema.2019-07-31T00:00:00

    Machine learning-based ensemble recursive feature selection of circulating miRNAs for cancer tumor classification

    Get PDF
    Circulating microRNAs (miRNA) are small noncoding RNA molecules that can be detected in bodily fluids without the need for major invasive procedures on patients. miRNAs have shown great promise as biomarkers for tumors to both assess their presence and to predict their type and subtype. Recently, thanks to the availability of miRNAs datasets, machine learning techniques have been successfully applied to tumor classification. The results, however, are difficult to assess and interpret by medical experts because the algorithms exploit information from thousands of miRNAs. In this work, we propose a novel technique that aims at reducing the necessary information to the smallest possible set of circulating miRNAs. The dimensionality reduction achieved reflects a very important first step in a potential, clinically actionable, circulating miRNA-based precision medicine pipeline. While it is currently under discussion whether this first step can be taken, we demonstrate here that it is possible to perform classification tasks by exploiting a recursive feature elimination procedure that integrates a heterogeneous ensemble of high-quality, state-of-the-art classifiers on circulating miRNAs. Heterogeneous ensembles can compensate inherent biases of classifiers by using different classification algorithms. Selectin

    Multi-omics data integration for the detection and characterization of smoking related lung diseases

    Get PDF
    Lung cancer is the leading cause of death from cancer in the world. First, we hypothesized that microRNA expression is altered in the bronchial epithelium of patients with lung cancer and that incorporating microRNA expression into an existing mRNA biomarker may improve its performance. Using bronchial brushings collected from current and former smokers, we profiled microRNA expression via small RNA sequencing for 347 patients with available mRNA data. We found that four microRNAs were under-expressed in cancer patients compared to controls (p<0.002, FDR<0.2). We explored the role of these microRNAs and their gene targets in cancer. In addition, we found that adding a microRNA feature to an existing 23-gene biomarker significantly improves its performance (AUC) in a test set (p<0.05). Next, we generalized the biomarker discovery process, and developed a visualization tool for biomarker selection. We built upon an existing biomarker discovery pipeline and created a web-based interface to visualize the performance of multiple predictors. The “visualization” component is the key to sorting through a thousand potential biomarkers, and developing clinically useful molecular predictors. Finally, we explored the molecular events leading to the development of COPD and ILD, two heterogeneous diseases with high mortality. We hypothesized that integrative genetic and expression networks can help identify drivers and elucidate mechanisms of genetic susceptibility. We utilized 262 lung tissue specimens profiled with microRNA sequencing, microarray gene expression and SNP chip genotyping. Next, we built condition specific integrative networks using a causality inference test for predicting SNP-microRNA-mRNA associations, where the microRNA is a predicted mediator of the SNP’s effect on gene expression. We identified the microRNAs predicted to affect the most genes within each network. Members of miR-34/449 family, known to promote airway differentiation by repressing the Notch pathway, were among the top ranked microRNAs in COPD and ILD networks, but not in the non-disease network. In addition, the miR-34/449 gene module was enriched among genes that increase in expression over time when airway basal cells are differentiated at an air-liquid interface and among genes that increase in expression with the airway wall thickening in patients with emphysema.2019-07-31T00:00:00

    Small molecule-mediated targeting of microRNAs for drug discovery: experiments, computational techniques, and disease implications

    Get PDF
    Small molecules have been providing medical breakthroughs for human diseases for more than a century. Recently, identifying small molecule inhibitors that target microRNAs (miRNAs) has gained importance, despite the challenges posed by labour-intensive screening experiments and the significant efforts required for medicinal chemistry optimization. Numerous experimentally-verified cases have demonstrated the potential of miRNA-targeted small molecule inhibitors for disease treatment. This new approach is grounded in their posttranscriptional regulation of the expression of disease-associated genes. Reversing dysregulated gene expression using this mechanism may help control dysfunctional pathways. Furthermore, the ongoing improvement of algorithms has allowed for the integration of computational strategies built on top of laboratory-based data, facilitating a more precise and rational design and discovery of lead compounds. To complement the use of extensive pharmacogenomics data in prioritising potential drugs, our previous work introduced a computational approach based on only molecular sequences. Moreover, various computational tools for predicting molecular interactions in biological networks using similarity-based inference techniques have been accumulated in established studies. However, there are a limited number of comprehensive reviews covering both computational and experimental drug discovery processes. In this review, we outline a cohesive overview of both biological and computational applications in miRNA-targeted drug discovery, along with their disease implications and clinical significance. Finally, utilizing drug-target interaction (DTIs) data from DrugBank, we showcase the effectiveness of deep learning for obtaining the physicochemical characterization of DTIs

    Discovering lesser known molecular players and mechanistic patterns in Alzheimer's disease using an integrative disease modelling approach

    Get PDF
    Convergence of exponentially advancing technologies is driving medical research with life changing discoveries. On the contrary, repeated failures of high-profile drugs to battle Alzheimer's disease (AD) has made it one of the least successful therapeutic area. This failure pattern has provoked researchers to grapple with their beliefs about Alzheimer's aetiology. Thus, growing realisation that Amyloid-β and tau are not 'the' but rather 'one of the' factors necessitates the reassessment of pre-existing data to add new perspectives. To enable a holistic view of the disease, integrative modelling approaches are emerging as a powerful technique. Combining data at different scales and modes could considerably increase the predictive power of the integrative model by filling biological knowledge gaps. However, the reliability of the derived hypotheses largely depends on the completeness, quality, consistency, and context-specificity of the data. Thus, there is a need for agile methods and approaches that efficiently interrogate and utilise existing public data. This thesis presents the development of novel approaches and methods that address intrinsic issues of data integration and analysis in AD research. It aims to prioritise lesser-known AD candidates using highly curated and precise knowledge derived from integrated data. Here much of the emphasis is put on quality, reliability, and context-specificity. This thesis work showcases the benefit of integrating well-curated and disease-specific heterogeneous data in a semantic web-based framework for mining actionable knowledge. Furthermore, it introduces to the challenges encountered while harvesting information from literature and transcriptomic resources. State-of-the-art text-mining methodology is developed to extract miRNAs and its regulatory role in diseases and genes from the biomedical literature. To enable meta-analysis of biologically related transcriptomic data, a highly-curated metadata database has been developed, which explicates annotations specific to human and animal models. Finally, to corroborate common mechanistic patterns — embedded with novel candidates — across large-scale AD transcriptomic data, a new approach to generate gene regulatory networks has been developed. The work presented here has demonstrated its capability in identifying testable mechanistic hypotheses containing previously unknown or emerging knowledge from public data in two major publicly funded projects for Alzheimer's, Parkinson's and Epilepsy diseases

    Evaluation of blood-based microRNAs toward clinical use as biomarkers in common and rare diseases

    Get PDF
    According to the GLOBOCAN project of the International Agency for Research on Cancer, the top three common cancer diseases worldwide in the year 2020 were breast, lung and colorectal cancer. These are usually diagnosed via imaging methods (e.g. computer tomography) or invasive methods (e.g. biopsy). However, these techniques are potentially risky and expensive and thus not accessible to all patients, resulting in most cancers being detected in an advanced stage. Since the discovery of small non-coding RNAs and specifically microRNAs and their role as gene regulators, many researchers investigate their association with disease development. In particular, researchers examine body fluid based microRNAs which could present potential cost-effective and minimally- or non-invasive alternatives to the previously described established diagnosis methods. This dissertation focuses on microRNAs and investigates their suitability as minimally-invasive blood-borne biomarkers for potential diagnostic purposes. More specifically, the goals of this work are (1) to implement a new method to predict novel microRNAs, (2) to understand stability and characteristics of these small non-coding RNAs, possibly relevant for the last goal, (3) to discover potential diagnostic biomarkers in common and rare diseases. The first goal was addressed by developing miRMaster, a web service to predict new microRNAs. The tool uses machine learning and high-throughput sequencing data to find microRNA candidates that follow the known biogenesis pathways. The second goal was pursued in four publications. First, we performed a large scale evaluation of miRMaster by generating a high-resolution map of the human small non-coding RNA transcriptome for which we analyzed and validated potential microRNA candidates. Next, we examined the influence of seasonal effects on microRNA expression profiles and observed the largest difference between spring and the other seasons. Additionally, we evaluated the evolutionary conservation of small non-coding RNAs in zoo animals and showed that the distribution of sncRNA classes varies across species, while common microRNA families are present in more diverse organisms than assumed so far. Furthermore, we analyzed if microRNAs are technically stable, and whether biological variation is preserved when using capillary dried blood spots as an alternative sample collection device to venous blood specimens. Finally, we investigated the suitability of microRNAs as biomarkers for two diseases: lung cancer and Marfan disease. We identified blood-borne biomarker candidates for lung cancer detection in a large-scale multi-center study via machine learning. For the rare Marfan disease we analyzed the paired messenger RNA and microRNA expression levels in whole-blood samples. This highlighted several significantly deregulated microRNAs and messenger RNAs, which we subsequently validated in an independent cohort. In summary, this thesis provides valuable results toward potential clinical use of microRNAs, and the herein described projects represent comprehensive analyses of them from different perspectives: starting with microRNA discovery, addressing various technical and biological questions and ending with the potential use as biomarkers.Nach Angaben des GLOBOCAN-Projekts der International Agency for Research on Cancer sind die drei häufigsten Krebserkrankungen weltweit im Jahr 2020 Brust-, Lungen- und Darmkrebs. Diese werden in der Regel durch bildgebende Verfahren (z.B. Computertomographie) oder invasive Methoden (z.B. Biopsie) diagnostiziert. Diese Verfahren sind jedoch potenziell risikoreich und teuer und daher nicht für alle Patienten zugänglich. Dies führt dazu, dass die meisten Krebsarten erst in einem fortgeschrittenen Stadium entdeckt werden. Seit der Entdeckung der kurzen nichtkodierenden RNAs und insbesondere der microRNAs und ihrer Rolle als Genregulatoren untersuchen viele Forscher ihren Zusammenhang mit der Krankheitsentwicklung. Insbesondere untersuchen die Forscher die in Körperflüssigkeiten vorkommenden microRNAs, die potenziell kosteneffiziente und minimal- oder nicht-invasive Alternativen zu den bisher beschriebenen etablierten Diagnosemethoden darstellen könnten. Diese Dissertation konzentriert sich auf microRNAs und untersucht deren Eignung als minimal-invasive blutbasierte Biomarker für potenzielle diagnostische Zwecke. Genauer gesagt sind die Ziele dieser Arbeit (1) die Implementierung einer neuen Methode zur Vorhersage neuartiger microRNAs, (2) das Verständnis über die Stabilität und Charakteristika dieser kurzen nicht-kodierenden RNAs, die möglicherweise für das nächste Ziel relevant sind, (3) die Entdeckung potenzieller diagnostischer Biomarker für verschiedene Anwendungen. Das erste Ziel wurde durch die Entwicklung von miRMaster verfolgt, einem Webdienst zur Vorhersage neuer microRNAs. Das Tool nutzt maschinelles Lernen und Hochdurchsatz-Sequenzierungsdaten, um microRNA-Kandidaten zu finden, die den bekannten Wege der Biogenese folgen. Das zweite Ziel wurde in vier Veröffentlichungen verfolgt. Zunächst führten wir eine groß angelegte Evaluierung von miRMaster durch, indem wir eine High-Resolution Map des menschlichen Transkriptoms kurzer nichtkodierender RNAs erstellten, für die wir potenzielle microRNA-Kandidaten analysierten und validierten. Anschließend untersuchten wir den Einfluss saisonaler Effekte auf die microRNA-Expressionsprofile und beobachteten den größten Unterschied zwischen dem Frühling und den anderen Jahreszeiten. Darüber hinaus untersuchten wir die evolutionäre Erhaltung kurzer nichtkodierender RNAs in Zoo-Tieren und zeigten, dass die Verteilung der kurzer nichtkodierenden RNA-Klassen zwischen den Arten variiert, während gemeinsame microRNA-Familien in verschiedeneren Organismen vorkommen als bisher angenommen. Darüber hinaus analysierten wir, ob microRNAs technisch stabil sind und ob die biologische Variation erhalten bleibt, wenn kapillares Trockenblut als alternatives Probenentnahmeverfahren zu venösen Blutproben verwendet werden. Schließlich untersuchten wir die Eignung von microRNAs als Biomarker für zwei Krankheiten: Lungenkrebs und Marfan-Krankheit. In einer groß angelegten multizentrischen Studie identifizierten wir mit Hilfe von maschinellem Lernen Biomarker-Kandidaten aus dem Blut für die Erkennung von Lungenkrebs. Für die seltene Marfan-Krankheit analysierten wir die gepaarten Expressionsniveaus von messengerRNA und microRNA in Vollblutproben. Dabei wurden mehrere signifikant deregulierte microRNAs und messengerRNAs festgestellt, die wir anschließend in einer unabhängigen Kohorte validierten. Zusammenfassend lässt sich sagen, dass diese Arbeit wertvolle Ergebnisse im Hinblick auf die potenzielle klinische Verwendung von microRNAs liefert. Die hier beschriebenen Projekte stellen umfassende Analysen aus verschiedenen Blickwinkeln dar: angefangen bei der Entdeckung von microRNAs, über verschiedene technische und biologische Fragen bis hin zur potenziellen Verwendung als Biomarker

    Machine Learning Based Diagnostic Paradigm in Viral and Non-Viral Hepatocellular Carcinoma

    Get PDF
    © 2024 The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY), https://creativecommons.org/licenses/by/4.0/Viral and non-viral hepatocellular carcinoma (HCC) is becoming predominant in developing countries. A major issue linked to HCC-related mortality rate is the late diagnosis of cancer development. Although traditional approaches to diagnosing HCC have become gold-standard, there remain several limitations due to which the confirmation of cancer progression takes a longer period. The recent emergence of artificial intelligence tools with the capacity to analyze biomedical datasets is assisting traditional diagnostic approaches for early diagnosis with certainty. Here we present a review of traditional HCC diagnostic approaches versus the use of artificial intelligence (Machine Learning and Deep Learning) for HCC diagnosis. The overview of the cancer-related databases along with the use of AI in histopathology, radiology, biomarker, and electronic health records (EHRs) based HCC diagnosis is given.Peer reviewe
    • …
    corecore