676 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Breakthroughs in genomics data integration for predicting clinical outcome

    Get PDF

    Bridging the Gap in Personalized Oncology using Omics Data and Epidemiology

    Get PDF
    As Personalized Medicine tailored the field of precision oncology, many challenges have been arising to fulfill the dream of a full personalized health integrated system in cancer therapy. Personalized oncology has been addressed through the past decades in multiple disease and various stages using high throughput technology. This review gives hand on recent advances of personalized oncology in several cancer disease models including leukemia, melanoma, breast cancer, lung cancer, colorectal cancer, and prostate cancer. Moreover, the review enumerates technology-based assessment of personalized biomarkers, including chip micro-array, organ on chip, and next generation sequencing. Meanwhile addressing challenges faced in implementing true personalized health cancer in oncology setting, this review focuses on bridging the gap between omics data analytics and epidemiology to overcome the true challenge of direct application

    Biologically Interpretable, Integrative Deep Learning for Cancer Survival Analysis

    Get PDF
    Identifying complex biological processes associated to patients\u27 survival time at the cellular and molecular level is critical not only for developing new treatments for patients but also for accurate survival prediction. However, highly nonlinear and high-dimension, low-sample size (HDLSS) data cause computational challenges in survival analysis. We developed a novel family of pathway-based, sparse deep neural networks (PASNet) for cancer survival analysis. PASNet family is a biologically interpretable neural network model where nodes in the network correspond to specific genes and pathways, while capturing nonlinear and hierarchical effects of biological pathways associated with certain clinical outcomes. Furthermore, integration of heterogeneous types of biological data from biospecimen holds promise of improving survival prediction and personalized therapies in cancer. Specifically, the integration of genomic data and histopathological images enhances survival predictions and personalized treatments in cancer study, while providing an in-depth understanding of genetic mechanisms and phenotypic patterns of cancer. Two proposed models will be introduced for integrating multi-omics data and pathological images, respectively. Each model in PASNet family was evaluated by comparing the performance of current cutting-edge models with The Cancer Genome Atlas (TCGA) cancer data. In the extensive experiments, PASNet family outperformed the benchmarking methods, and the outstanding performance was statistically assessed. More importantly, PASNet family showed the capability to interpret a multi-layered biological system. A number of biological literature in GBM supported the biological interpretation of the proposed models. The open-source software of PASNet family in PyTorch is publicly available at https://github.com/DataX-JieHao

    Precision medicine ― A promising, yet challenging road lies ahead

    Get PDF
    Precision medicine proposes to individualize the practice of medicine based on patients’ genetic backgrounds, their biomarker characteristics and other omics datasets. After outlining the key challenges in precision medicine, namely patient stratification, biomarker discovery and drug repurposing, we survey recent developments in high-throughput technologies and big biological datasets that shape the future of precision medicine. Furthermore, we provide an overview of recent data-integrative approaches that have been successfully used in precision medicine for mining medical knowledge from big-biological data, and we highlight modeling and computing issues that such integrative approaches will face due to the ever-growing nature of big-biological data. Finally, we raise attention to the challenges in translational medicine when moving from research findings to approved medical practices

    A knowledge graph to interpret clinical proteomics data

    Get PDF
    Implementing precision medicine hinges on the integration of omics data, such as proteomics, into the clinical decision-making process, but the quantity and diversity of biomedical data, and the spread of clinically relevant knowledge across multiple biomedical databases and publications, pose a challenge to data integration. Here we present the Clinical Knowledge Graph (CKG), an open-source platform currently comprising close to 20 million nodes and 220 million relationships that represent relevant experimental data, public databases and literature. The graph structure provides a flexible data model that is easily extendable to new nodes and relationships as new databases become available. The CKG incorporates statistical and machine learning algorithms that accelerate the analysis and interpretation of typical proteomics workflows. Using a set of proof-of-concept biomarker studies, we show how the CKG might augment and enrich proteomics data and help inform clinical decision-making

    Engineering simulations for cancer systems biology

    Get PDF
    Computer simulation can be used to inform in vivo and in vitro experimentation, enabling rapid, low-cost hypothesis generation and directing experimental design in order to test those hypotheses. In this way, in silico models become a scientific instrument for investigation, and so should be developed to high standards, be carefully calibrated and their findings presented in such that they may be reproduced. Here, we outline a framework that supports developing simulations as scientific instruments, and we select cancer systems biology as an exemplar domain, with a particular focus on cellular signalling models. We consider the challenges of lack of data, incomplete knowledge and modelling in the context of a rapidly changing knowledge base. Our framework comprises a process to clearly separate scientific and engineering concerns in model and simulation development, and an argumentation approach to documenting models for rigorous way of recording assumptions and knowledge gaps. We propose interactive, dynamic visualisation tools to enable the biological community to interact with cellular signalling models directly for experimental design. There is a mismatch in scale between these cellular models and tissue structures that are affected by tumours, and bridging this gap requires substantial computational resource. We present concurrent programming as a technology to link scales without losing important details through model simplification. We discuss the value of combining this technology, interactive visualisation, argumentation and model separation to support development of multi-scale models that represent biologically plausible cells arranged in biologically plausible structures that model cell behaviour, interactions and response to therapeutic interventions

    Integrative OMICS Data-Driven Procedure Using a Derivatized Meta-Analysis Approach

    Full text link
    The wealth of high-throughput data has opened up new opportunities to analyze and describe biological processes at higher resolution, ultimately leading to a significant acceleration of scientific output using high-throughput data from the different omics layers and the generation of databases to store and report raw datasets. The great variability among the techniques and the heterogeneous methodologies used to produce this data have placed meta-analysis methods as one of the approaches of choice to correlate the resultant large-scale datasets from different research groups. Through multi-study meta-analyses, it is possible to generate results with greater statistical power compared to individual analyses. Gene signatures, biomarkers and pathways that provide new insights of a phenotype of interest have been identified by the analysis of large-scale datasets in several fields of science. However, despite all the efforts, a standardized regulation to report large-scale data and to identify the molecular targets and signaling networks is still lacking. Integrative analyses have also been introduced as complementation and augmentation for meta-analysis methodologies to generate novel hypotheses. Currently, there is no universal method established and the different methods available follow different purposes. Herein we describe a new unifying, scalable and straightforward methodology to meta-analyze different omics outputs, but also to integrate the significant outcomes into novel pathways describing biological processes of interest. The significance of using proper molecular identifiers is highlighted as well as the potential to further correlate molecules from different regulatory levels. To show the methodology's potential, a set of transcriptomic datasets are meta-analyzed as an example

    Big Data in Oncology Nursing Research: State of the Science.

    Get PDF
    To review the state of oncology nursing science as it pertains to big data. The authors aim to define and characterize big data, describe key considerations for accessing and analyzing big data, provide examples of analyses of big data in oncology nursing science, and highlight ethical considerations related to the collection and analysis of big data. Peer-reviewed articles published by investigators specializing in oncology, nursing, and related disciplines. Big data is defined as data that are high in volume, velocity, and variety. To date, oncology nurse scientists have used big data to predict patient outcomes from clinician notes, identify distinct symptom phenotypes, and identify predictors of chemotherapy toxicity, among other applications. Although the emergence of big data and advances in computational methods provide new and exciting opportunities to advance oncology nursing science, several challenges are associated with accessing and using big data. Data security, research participant privacy, and the underrepresentation of minoritized individuals in big data are important concerns. With their unique focus on the interplay between the whole person, the environment, and health, nurses bring an indispensable perspective to the interpretation and application of big data research findings. Given the increasing ubiquity of passive data collection, all nurses should be taught the definition, characteristics, applications, and limitations of big data. Nurses who are trained in big data and advanced computational methods will be poised to contribute to guidelines and policies that preserve the rights of human research participants
    corecore