3,586 research outputs found

    Integrative mixture of experts to combine clinical factors and gene markers

    Get PDF
    Motivation: Microarrays are being increasingly used in cancer research to better characterize and classify tumors by selecting marker genes. However, as very few of these genes have been validated as predictive biomarkers so far, it is mostly conventional clinical and pathological factors that are being used as prognostic indicators of clinical course. Combining clinical data with gene expression data may add valuable information, but it is a challenging task due to their categorical versus continuous characteristics. We have further developed the mixture of experts (ME) methodology, a promising approach to tackle complex non-linear problems. Several variants are proposed in integrative ME as well as the inclusion of various gene selection methods to select a hybrid signature

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    BAYESIAN INTEGRATIVE ANALYSIS OF OMICS DATA

    Get PDF
    Technological innovations have produced large multi-modal datasets that range in multiplatform genomic data, pathway data, proteomic data, imaging data and clinical data. Integrative analysis of such data sets have potentiality in revealing important biological and clinical insights into complex diseases like cancer. This dissertation focuses on Bayesian methodology establishment in integrative analysis of radiogenomics and pathway driver detection applied in cancer applications. We initially present Radio-iBAG that utilizes Bayesian approaches in analyzing radiological imaging and multi-platform genomic data, which we establish a multi-scale Bayesian hierarchical model that simultaneously identifies genomic and radiomic, i.e., radiology-based imaging markers, along with the latent associations between these two modalities, and to detect the overall prognostic relevance of the combined markers. Our method is motivated by and applied to The Cancer Genome Atlas glioblastoma multiforme data set, wherein it identifies important magnetic resonance imaging features and the associated genomic platforms that are also significantly related with patient survival times. For another aspect of integrative analysis, we then present pathDrive that aims to detect key genetic and epigenetic upstream drivers that influence pathway activity. The method is applied into colorectal cancer incorporated with its four molecular subtypes. For each of the pathways that significantly differentiates subgroups, we detect important genomic drivers that can be viewed as “switches” for the pathway activity. To extend the analysis, finally, we develop proteomic based pathway driver analysis for multiple cancer types wherein we simultaneously detect genomic upstream factors that influence a specific pathway for each cancer type within the cancer group. With Bayesian hierarchical model, we detect signals borrowing strength from common cancer type to rare cancer type, and simultaneously estimate their selection similarity. Through simulation study, our method is demonstrated in providing many advantages, including increased power and lower false discovery rates. We then apply the method into the analysis of multiple cancer groups, wherein we detect key genomic upstream drivers with proper biological interpretation. The overall framework and methodologies established in this dissertation illustrate further investigation in the field of integrative analysis of omics data, provide more comprehensive insight into biological mechanisms and processes, cancer development and progression

    Recent trends in molecular diagnostics of yeast infections : from PCR to NGS

    Get PDF
    The incidence of opportunistic yeast infections in humans has been increasing over recent years. These infections are difficult to treat and diagnose, in part due to the large number and broad diversity of species that can underlie the infection. In addition, resistance to one or several antifungal drugs in infecting strains is increasingly being reported, severely limiting therapeutic options and showcasing the need for rapid detection of the infecting agent and its drug susceptibility profile. Current methods for species and resistance identification lack satisfactory sensitivity and specificity, and often require prior culturing of the infecting agent, which delays diagnosis. Recently developed high-throughput technologies such as next generation sequencing or proteomics are opening completely new avenues for more sensitive, accurate and fast diagnosis of yeast pathogens. These approaches are the focus of intensive research, but translation into the clinics requires overcoming important challenges. In this review, we provide an overview of existing and recently emerged approaches that can be used in the identification of yeast pathogens and their drug resistance profiles. Throughout the text we highlight the advantages and disadvantages of each methodology and discuss the most promising developments in their path from bench to bedside

    Systems Analytics and Integration of Big Omics Data

    Get PDF
    A “genotype"" is essentially an organism's full hereditary information which is obtained from its parents. A ""phenotype"" is an organism's actual observed physical and behavioral properties. These may include traits such as morphology, size, height, eye color, metabolism, etc. One of the pressing challenges in computational and systems biology is genotype-to-phenotype prediction. This is challenging given the amount of data generated by modern Omics technologies. This “Big Data” is so large and complex that traditional data processing applications are not up to the task. Challenges arise in collection, analysis, mining, sharing, transfer, visualization, archiving, and integration of these data. In this Special Issue, there is a focus on the systems-level analysis of Omics data, recent developments in gene ontology annotation, and advances in biological pathways and network biology. The integration of Omics data with clinical and biomedical data using machine learning is explored. This Special Issue covers new methodologies in the context of gene–environment interactions, tissue-specific gene expression, and how external factors or host genetics impact the microbiome

    Classification based on extensions of LS-PLS using logistic regression: application toclinical and multiple genomic data

    Get PDF
    International audiencePrediction from high-dimensional genomic data is an active field in today's medical research. Most of the proposed prediction methods make use of genomic data alone without considering established clinical data that often are available and known to have predictive value. Recent studies suggest that combining clinical and genomic information may improve predictions. We consider in this paper methods for classification purposes that simultaneously use both types of variables, but applying dimension reduction only to the high-dimensional genomic ones. A usual way to deal with that is the use of a two-step approach. In step one, dimensionality reduction technique is just performed on the genomic dataset. In step two, the selected genomic variables are merged with the clinical variables to build a classification model on the combined dataset. Nevertheless, the reduction dimension is built without taking into account the link between the response variable and the clinical data. To address this issue, using Partial Least Squares (PLS) as reduction technique, we propose here a one step approach based on three extensions of LS-PLS (LS for Least Squares) method for logistic regression context. We perform a simulation study to evaluate these approaches compared to methods using only the clinical data or only genetic data. Then, we illustrate their performances to classify two real data sets containing both clinical information and gene expression
    • 

    corecore