40,020 research outputs found
The Personal Genome Project-UK, an open access resource of human multi-omics data
Integrative analysis of multi-omics data is a powerful approach for gaining functional insights into biological and medical processes. Conducting these multifaceted analyses on human samples is often complicated by the fact that the raw sequencing output is rarely available under open access. The Personal Genome Project UK (PGP-UK) is one of few resources that recruits its participants under open consent and makes the resulting multi-omics data freely and openly available. As part of this resource, we describe the PGP-UK multi-omics reference panel consisting of ten genomic, methylomic and transcriptomic data. Specifically, we outline the data processing, quality control and validation procedures which were implemented to ensure data integrity and exclude sample mix-ups. In addition, we provide a REST API to facilitate the download of the entire PGP-UK dataset. The data are also available from two cloud-based environments, providing platforms for free integrated analysis. In conclusion, the genotype-validated PGP-UK multi-omics human reference panel described here provides a valuable new open access resource for integrated analyses in support of personal and medical genomics
Multi-omics integration reveals molecular networks and regulators of psoriasis.
BackgroundPsoriasis is a complex multi-factorial disease, involving both genetic susceptibilities and environmental triggers. Genome-wide association studies (GWAS) and epigenome-wide association studies (EWAS) have been carried out to identify genetic and epigenetic variants that are associated with psoriasis. However, these loci cannot fully explain the disease pathogenesis.MethodsTo achieve a comprehensive mechanistic understanding of psoriasis, we conducted a systems biology study, integrating multi-omics datasets including GWAS, EWAS, tissue-specific transcriptome, expression quantitative trait loci (eQTLs), gene networks, and biological pathways to identify the key genes, processes, and networks that are genetically and epigenetically associated with psoriasis risk.ResultsThis integrative genomics study identified both well-characterized (e.g., the IL17 pathway in both GWAS and EWAS) and novel biological processes (e.g., the branched chain amino acid catabolism process in GWAS and the platelet and coagulation pathway in EWAS) involved in psoriasis. Finally, by utilizing tissue-specific gene regulatory networks, we unraveled the interactions among the psoriasis-associated genes and pathways in a tissue-specific manner and detected potential key regulatory genes in the psoriasis networks.ConclusionsThe integration and convergence of multi-omics signals provide deeper and comprehensive insights into the biological mechanisms associated with psoriasis susceptibility
Transkingdom Networks: A Systems Biology Approach to Identify Causal Members of Host-Microbiota Interactions
Improvements in sequencing technologies and reduced experimental costs have
resulted in a vast number of studies generating high-throughput data. Although
the number of methods to analyze these "omics" data has also increased,
computational complexity and lack of documentation hinder researchers from
analyzing their high-throughput data to its true potential. In this chapter we
detail our data-driven, transkingdom network (TransNet) analysis protocol to
integrate and interrogate multi-omics data. This systems biology approach has
allowed us to successfully identify important causal relationships between
different taxonomic kingdoms (e.g. mammals and microbes) using diverse types of
data
Depression and suicide risk prediction models using blood-derived multi-omics data
More than 300 million people worldwide experience depression; annually, ~800,000 people die by suicide. Unfortunately, conventional interview-based diagnosis is insufficient to accurately predict a psychiatric status. We developed machine learning models to predict depression and suicide risk using blood methylome and transcriptome data from 56 suicide attempters (SAs), 39 patients with major depressive disorder (MDD), and 87 healthy controls. Our random forest classifiers showed accuracies of 92.6% in distinguishing SAs from MDD patients, 87.3% in distinguishing MDD patients from controls, and 86.7% in distinguishing SAs from controls. We also developed regression models for predicting psychiatric scales with R2 values of 0.961 and 0.943 for Hamilton Rating Scale for Depression???17 and Scale for Suicide Ideation, respectively. Multi-omics data were used to construct psychiatric status prediction models for improved mental health treatment
Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities
New technologies have enabled the investigation of biology and human health
at an unprecedented scale and in multiple dimensions. These dimensions include
a myriad of properties describing genome, epigenome, transcriptome, microbiome,
phenotype, and lifestyle. No single data type, however, can capture the
complexity of all the factors relevant to understanding a phenomenon such as a
disease. Integrative methods that combine data from multiple technologies have
thus emerged as critical statistical and computational approaches. The key
challenge in developing such approaches is the identification of effective
models to provide a comprehensive and relevant systems view. An ideal method
can answer a biological or medical question, identifying important features and
predicting outcomes, by harnessing heterogeneous data across several dimensions
of biological variation. In this Review, we describe the principles of data
integration and discuss current methods and available implementations. We
provide examples of successful data integration in biology and medicine.
Finally, we discuss current challenges in biomedical integrative methods and
our perspective on the future development of the field
Unconventional machine learning of genome-wide human cancer data
Recent advances in high-throughput genomic technologies coupled with
exponential increases in computer processing and memory have allowed us to
interrogate the complex aberrant molecular underpinnings of human disease from
a genome-wide perspective. While the deluge of genomic information is expected
to increase, a bottleneck in conventional high-performance computing is rapidly
approaching. Inspired in part by recent advances in physical quantum
processors, we evaluated several unconventional machine learning (ML)
strategies on actual human tumor data. Here we show for the first time the
efficacy of multiple annealing-based ML algorithms for classification of
high-dimensional, multi-omics human cancer data from the Cancer Genome Atlas.
To assess algorithm performance, we compared these classifiers to a variety of
standard ML methods. Our results indicate the feasibility of using
annealing-based ML to provide competitive classification of human cancer types
and associated molecular subtypes and superior performance with smaller
training datasets, thus providing compelling empirical evidence for the
potential future application of unconventional computing architectures in the
biomedical sciences
- …
