3,648 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis
Cancer is one of the most detrimental diseases globally. Accordingly, the prognosis
prediction of cancer patients has become a field of interest. In this review, we have gathered 43 stateof-
the-art scientific papers published in the last 6 years that built cancer prognosis predictive models
using multimodal data. We have defined the multimodality of data as four main types: clinical,
anatomopathological, molecular, and medical imaging; and we have expanded on the information
that each modality provides. The 43 studies were divided into three categories based on the modelling
approach taken, and their characteristics were further discussed together with current issues and
future trends. Research in this area has evolved from survival analysis through statistical modelling
using mainly clinical and anatomopathological data to the prediction of cancer prognosis through a
multi-faceted data-driven approach by the integration of complex, multimodal, and high-dimensional
data containing multi-omics and medical imaging information and by applying Machine Learning
and, more recently, Deep Learning techniques. This review concludes that cancer prognosis predictive
multimodal models are capable of better stratifying patients, which can improve clinical management
and contribute to the implementation of personalised medicine as well as provide new and valuable
knowledge on cancer biology and its progression
A Practical Guide to Integrating Multimodal Machine Learning and Metabolic Modeling
Complex, distributed, and dynamic sets of clinical biomedical data are collectively referred to as multimodal clinical data. In order to accommodate the volume and heterogeneity of such diverse data types and aid in their interpretation when they are combined with a multi-scale predictive model, machine learning is a useful tool that can be wielded to deconstruct biological complexity and extract relevant outputs. Additionally, genome-scale metabolic models (GSMMs) are one of the main frameworks striving to bridge the gap between genotype and phenotype by incorporating prior biological knowledge into mechanistic models. Consequently, the utilization of GSMMs as a foundation for the integration of multi-omic data originating from different domains is a valuable pursuit towards refining predictions. In this chapter, we show how cancer multi-omic data can be analyzed via multimodal machine learning and metabolic modeling. Firstly, we focus on the merits of adopting an integrative systems biology led approach to biomedical data mining. Following this, we propose how constraint-based metabolic models can provide a stable yet adaptable foundation for the integration of multimodal data with machine learning. Finally, we provide a step-by-step tutorial for the combination of machine learning and GSMMs, which includes: (i) tissue-specific constraint-based modeling; (ii) survival analysis using time-to-event prediction for cancer; and (iii) classification and regression approaches for multimodal machine learning. The code associated with the tutorial can be found at https://github.com/Angione-Lab/Tutorials_Combining_ML_and_GSMM
AI-Enabled Lung Cancer Prognosis
Lung cancer is the primary cause of cancer-related mortality, claiming
approximately 1.79 million lives globally in 2020, with an estimated 2.21
million new cases diagnosed within the same period. Among these, Non-Small Cell
Lung Cancer (NSCLC) is the predominant subtype, characterized by a notably
bleak prognosis and low overall survival rate of approximately 25% over five
years across all disease stages. However, survival outcomes vary considerably
based on the stage at diagnosis and the therapeutic interventions administered.
Recent advancements in artificial intelligence (AI) have revolutionized the
landscape of lung cancer prognosis. AI-driven methodologies, including machine
learning and deep learning algorithms, have shown promise in enhancing survival
prediction accuracy by efficiently analyzing complex multi-omics data and
integrating diverse clinical variables. By leveraging AI techniques, clinicians
can harness comprehensive prognostic insights to tailor personalized treatment
strategies, ultimately improving patient outcomes in NSCLC. Overviewing
AI-driven data processing can significantly help bolster the understanding and
provide better directions for using such systems.Comment: This is the author's version of a book chapter entitled: "Cancer
Research: An Interdisciplinary Approach", Springe
Autoencoder-based multimodal prediction of non-small cell lung cancer survival
The ability to accurately predict non-small cell lung cancer (NSCLC) patient survival is crucial for informing physician decision-making, and the increasing availability of multi-omics data offers the promise of enhancing prognosis predictions. We present a multimodal integration approach that leverages microRNA, mRNA, DNA methylation, long non-coding RNA (lncRNA) and clinical data to predict NSCLC survival and identify patient subtypes, utilizing denoising autoencoders for data compression and integration. Survival performance for patients with lung adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) was compared across modality combinations and data integration methods. Using The Cancer Genome Atlas data, our results demonstrate that survival prediction models combining multiple modalities outperform single modality models. The highest performance was achieved with a combination of only two modalities, lncRNA and clinical, at concordance indices (C-indices) of 0.69 ± 0.03 for LUAD and 0.62 ± 0.03 for LUSC. Models utilizing all five modalities achieved mean C-indices of 0.67 ± 0.04 and 0.63 ± 0.02 for LUAD and LUSC, respectively, while the best individual modality performance reached C-indices of 0.64 ± 0.03 for LUAD and 0.59 ± 0.03 for LUSC. Analysis of biological differences revealed two distinct survival subtypes with over 900 differentially expressed transcripts
Methods for Stratification and Validation Cohorts: A Scoping Review
Personalized medicine requires large cohorts for patient stratification and validation of patient clustering. However, standards and harmonized practices on the methods and tools to be used for the design and management of cohorts in personalized medicine remain to be defined. This study aims to describe the current state-of-the-art in this area. A scoping review was conducted searching in PubMed, EMBASE, Web of Science, Psycinfo and Cochrane Library for reviews about tools and methods related to cohorts used in personalized medicine. The search focused on cancer, stroke and Alzheimer's disease and was limited to reports in English, French, German, Italian and Spanish published from 2005 to April 2020. The screening process was reported through a PRISMA flowchart. Fifty reviews were included, mostly including information about how data were generated (25/50) and about tools used for data management and analysis (24/50). No direct information was found about the quality of data and the requirements to monitor associated clinical data. A scarcity of information and standards was found in specific areas such as sample size calculation. With this information, comprehensive guidelines could be developed in the future to improve the reproducibility and robustness in the design and management of cohorts in personalized medicine studies
Pathology-and-genomics Multimodal Transformer for Survival Outcome Prediction
Survival outcome assessment is challenging and inherently associated with
multiple clinical factors (e.g., imaging and genomics biomarkers) in cancer.
Enabling multimodal analytics promises to reveal novel predictive patterns of
patient outcomes. In this study, we propose a multimodal transformer
(PathOmics) integrating pathology and genomics insights into colon-related
cancer survival prediction. We emphasize the unsupervised pretraining to
capture the intrinsic interaction between tissue microenvironments in gigapixel
whole slide images (WSIs) and a wide range of genomics data (e.g.,
mRNA-sequence, copy number variant, and methylation). After the multimodal
knowledge aggregation in pretraining, our task-specific model finetuning could
expand the scope of data utility applicable to both multi- and single-modal
data (e.g., image- or genomics-only). We evaluate our approach on both TCGA
colon and rectum cancer cohorts, showing that the proposed approach is
competitive and outperforms state-of-the-art studies. Finally, our approach is
desirable to utilize the limited number of finetuned samples towards
data-efficient analytics for survival outcome prediction. The code is available
at https://github.com/Cassie07/PathOmics.Comment: Accepted to MICCAI2023 (Top14%
What I talk about when I talk about integration of single-cell data
Over the past decade, single-cell technologies evolved from profiling hundreds of cells to millions of cells, and emerged from a single modality of data to cover multiple views at single-cell resolution, including genome, epigenome, transcriptome, and so on. With advance of these single-cell technologies, the booming of multimodal single-cell data creates a valuable resource for us to understand cellular heterogeneity and molecular mechanism at a comprehensive level. However, the large-scale multimodal single-cell data also presents a huge computational challenge for insightful integrative analysis. Here, I will lay out problems in data integration that single-cell research community is interested in and introduce computational principles for solving these integration problems. In the following chapters, I will present four computational methods for data integration under different scenarios. Finally, I will discuss some future directions and potential applications of single-cell data integration
- …