4,522 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli.

    Get PDF
    A significant obstacle in training predictive cell models is the lack of integrated data sources. We develop semi-supervised normalization pipelines and perform experimental characterization (growth, transcriptional, proteome) to create Ecomics, a consistent, quality-controlled multi-omics compendium for Escherichia coli with cohesive meta-data information. We then use this resource to train a multi-scale model that integrates four omics layers to predict genome-wide concentrations and growth dynamics. The genetic and environmental ontology reconstructed from the omics data is substantially different and complementary to the genetic and chemical ontologies. The integration of different layers confers an incremental increase in the prediction performance, as does the information about the known gene regulatory and protein-protein interactions. The predictive performance of the model ranges from 0.54 to 0.87 for the various omics layers, which far exceeds various baselines. This work provides an integrative framework of omics-driven predictive modelling that is broadly applicable to guide biological discovery

    Integrated metabolome and transcriptome analysis of the NCI60 dataset

    Full text link
    Abstract Background Metabolite profiles can be used for identifying molecular signatures and mechanisms underlying diseases since they reflect the outcome of complex upstream genomic, transcriptomic, proteomic and environmental events. The scarcity of publicly accessible large scale metabolome datasets related to human disease has been a major obstacle for assessing the potential of metabolites as biomarkers as well as understanding the molecular events underlying disease-related metabolic changes. The availability of metabolite and gene expression profiles for the NCI-60 cell lines offers the possibility of identifying significant metabolome and transcriptome features and discovering unique molecular processes related to different cancer types. Methods We utilized a combination of analytical methods in the R statistical package to evaluate metabolic features associated with cancer cell lines from different tissue origins, identify metabolite-gene correlations and detect outliers cell lines based on metabolome and transcriptome data. Statistical analysis results are integrated with metabolic pathway annotations as well as COSMIC and Tumorscape databases to explore associated molecular mechanisms. Results Our analysis reveals that although the NCI-60 metabolome dataset is quite noisy comparing with microarray-based transcriptome data, it does contain tissue origin specific signatures. We also identified biologically meaningful gene-metabolite associations. Most remarkably, several abnormal gene-metabolite relationships identified by our approach can be directly linked to known gene mutations and copy number variations in the corresponding cell lines. Conclusions Our results suggest that integrative metabolome and transcriptome analysis is a powerful method for understanding molecular machinery underlying various pathophysiological processes. We expect the availability of large scale metabolome data in the coming years will significantly promote the discovery of novel biomarkers, which will in turn improve the understanding of molecular mechanism underlying diseases.http://deepblue.lib.umich.edu/bitstream/2027.42/112946/1/12859_2011_Article_4394.pd

    2D association and integrative omics analysis in rice provides systems biology view in trait analysis.

    Get PDF
    The interactions among genes and between genes and environment contribute significantly to the phenotypic variation of complex traits and may be possible explanations for missing heritability. However, to our knowledge no existing tool can address the two kinds of interactions. Here we propose a novel linear mixed model that considers not only the additive effects of biological markers but also the interaction effects of marker pairs. Interaction effect is demonstrated as a 2D association. Based on this linear mixed model, we developed a pipeline, namely PATOWAS. PATOWAS can be used to study transcriptome-wide and metabolome-wide associations in addition to genome-wide associations. Our case analysis with real rice recombinant inbred lines (RILs) at three omics levels demonstrates that 2D association mapping and integrative omics are able to provide a systems biology view into the analyzed traits, leading toward an answer about how genes, transcripts, proteins, and metabolites work together to produce an observable phenotype

    Topological Analysis of Metabolic Networks Integrating Co-Segregating Transcriptomes and Metabolomes in Type 2 Diabetic Rat Congenic Series

    Get PDF
    Background: The genetic regulation of metabolic phenotypes (i.e., metabotypes) in type 2 diabetes mellitus is caused by complex organ-specific cellular mechanisms contributing to impaired insulin secretion and insulin resistance. Methods: We used systematic metabotyping by 1H NMR spectroscopy and genome-wide gene expression in white adipose tissue to map molecular phenotypes to genomic blocks associated with obesity and insulin secretion in a series of rat congenic strains derived from spontaneously diabetic Goto-Kakizaki (GK) and normoglycemic Brown-Norway (BN) rats. We implemented a network biology strategy approach to visualise shortest paths between metabolites and genes significantly associated with each genomic block. Results: Despite strong genomic similarities (95-99%) among congenics, each strain exhibited specific patterns of gene expression and metabotypes, reflecting metabolic consequences of series of linked genetic polymorphisms in the congenic intervals. We subsequently used the congenic panel to map quantitative trait loci underlying specific metabotypes (mQTL) and genome-wide expression traits (eQTL). Variation in key metabolites like glucose, succinate, lactate or 3-hydroxybutyrate, and second messenger precursors like inositol was associated with several independent genomic intervals, indicating functional redundancy in these regions. To navigate through the complexity of these association networks we mapped candidate genes and metabolites onto metabolic pathways and implemented a shortest path strategy to highlight potential mechanistic links between metabolites and transcripts at colocalized mQTLs and eQTLs. Minimizing shortest path length drove prioritization of biological validations by gene silencing. Conclusions: These results underline the importance of network-based integration of multilevel systems genetics datasets to improve understanding of the genetic architecture of metabotype and transcriptomic regulations and to characterize novel functional roles for genes determining tissue-specific metabolism

    Systems biology of energetic and atomic costs in the yeast transcriptome, proteome, and metabolome

    Get PDF
    Proteins vary in their cost to the cell and natural selection may favour the use of proteins that are cheaper to produce. We develop a novel approach to estimate the amino acid biosynthetic cost based on genome-scale metabolic models, and directly investigate the effects of biosynthetic cost on transcriptomic, proteomic and metabolomic data in _Saccharomyces cerevisiae_. We find that our systems approach to formulating biosynthetic cost produces a novel measure that explains similar levels of variation in gene expression compared with previously reported cost measures. Regardless of the measure used, the cost of amino acid synthesis is weakly associated with transcript and protein levels, independent of codon usage bias. In contrast, energetic costs explain a large proportion of variation in levels of free amino acids. In the economy of the yeast cell, there appears to be no single currency to compute the cost of amino acid synthesis, and thus a systems approach is necessary to uncover the full effects of amino acid biosynthetic cost in complex biological systems that vary with cellular and environmental conditions

    Integrative omics approaches provide biological and clinical insights : examples from mitochondrial diseases

    Get PDF
    High-throughput technologies for genomics, transcriptomics, proteomics, and metabolomics, and integrative analysis of these data, enable new, systems-level insights into disease pathogenesis. Mitochondrial diseases are an excellent target for hypothesis-generating omics approaches, as the disease group is mechanistically exceptionally complex. Although the genetic background in mitochondrial diseases is in either the nuclear or the mitochondrial genome, the typical downstream effect is dysfunction of the mitochondrial respiratory chain. However, the clinical manifestations show unprecedented variability, including either systemic or tissue-specific effects across multiple organ systems, with mild to severe symptoms, and occurring at any age. So far, the omics approaches have provided mechanistic understanding of tissue-specificity and potential treatment options for mitochondrial diseases, such as metabolome remodeling. However, no curative treatments exist, suggesting that novel approaches are needed. In this Review, we discuss omics approaches and discoveries with the potential to elucidate mechanisms of and therapies for mitochondrial diseases.Peer reviewe

    Genetic regulation of mouse liver metabolite levels.

    Get PDF
    We profiled and analyzed 283 metabolites representing eight major classes of molecules including Lipids, Carbohydrates, Amino Acids, Peptides, Xenobiotics, Vitamins and Cofactors, Energy Metabolism, and Nucleotides in mouse liver of 104 inbred and recombinant inbred strains. We find that metabolites exhibit a wide range of variation, as has been previously observed with metabolites in blood serum. Using genome-wide association analysis, we mapped 40% of the quantified metabolites to at least one locus in the genome and for 75% of the loci mapped we identified at least one candidate gene by local expression QTL analysis of the transcripts. Moreover, we validated 2 of 3 of the significant loci examined by adenoviral overexpression of the genes in mice. In our GWAS results, we find that at significant loci the peak markers explained on average between 20 and 40% of variation in the metabolites. Moreover, 39% of loci found to be regulating liver metabolites in mice were also found in human GWAS results for serum metabolites, providing support for similarity in genetic regulation of metabolites between mice and human. We also integrated the metabolomic data with transcriptomic and clinical phenotypic data to evaluate the extent of co-variation across various biological scales

    Exploiting the mediating role of the metabolome to unravel transcript-to-phenotype associations.

    Get PDF
    Despite the success of genome-wide association studies (GWASs) in identifying genetic variants associated with complex traits, understanding the mechanisms behind these statistical associations remains challenging. Several methods that integrate methylation, gene expression, and protein quantitative trait loci (QTLs) with GWAS data to determine their causal role in the path from genotype to phenotype have been proposed. Here, we developed and applied a multi-omics Mendelian randomization (MR) framework to study how metabolites mediate the effect of gene expression on complex traits. We identified 216 transcript-metabolite-trait causal triplets involving 26 medically relevant phenotypes. Among these associations, 58% were missed by classical transcriptome-wide MR, which only uses gene expression and GWAS data. This allowed the identification of biologically relevant pathways, such as between ANKH and calcium levels mediated by citrate levels and SLC6A12 and serum creatinine through modulation of the levels of the renal osmolyte betaine. We show that the signals missed by transcriptome-wide MR are found, thanks to the increase in power conferred by integrating multiple omics layer. Simulation analyses show that with larger molecular QTL studies and in case of mediated effects, our multi-omics MR framework outperforms classical MR approaches designed to detect causal relationships between single molecular traits and complex phenotypes
    corecore