944 research outputs found
A primer on correlation-based dimension reduction methods for multi-omics analysis
The continuing advances of omic technologies mean that it is now more
tangible to measure the numerous features collectively reflecting the molecular
properties of a sample. When multiple omic methods are used, statistical and
computational approaches can exploit these large, connected profiles.
Multi-omics is the integration of different omic data sources from the same
biological sample. In this review, we focus on correlation-based dimension
reduction approaches for single omic datasets, followed by methods for pairs of
omics datasets, before detailing further techniques for three or more omic
datasets. We also briefly detail network methods when three or more omic
datasets are available and which complement correlation-oriented tools. To aid
readers new to this area, these are all linked to relevant R packages that can
implement these procedures. Finally, we discuss scenarios of experimental
design and present road maps that simplify the selection of appropriate
analysis methods. This review will guide researchers navigate the emerging
methods for multi-omics and help them integrate diverse omic datasets
appropriately and embrace the opportunity of population multi-omics.Comment: 30 pages, 2 figures, 6 table
Unveiling Novel Glioma Biomarkers through Multi-omics Integration and Classification
Glioma is currently one of the most prevalent types of primary brain cancer. Given its high
level of heterogeneity along with the complex biological molecular markers, many efforts
have been made to accurately classify the type of glioma in each patient, which, in turn, is
critical to improve early diagnosis and increase survival. Nonetheless, as a result of the fast-
growing technological advances in high throughput sequencing and evolving molecular
understanding of glioma biology, its classification has been recently subject to significant
alterations. In this study, multiple glioma omics modalities (including mRNA, DNA
methylation, and miRNA) from The Cancer Genome Atlas (TCGA) are integrated, while
using the revised glioma reclassified labels, with a supervised method based on sparse
canonical correlation analysis (DIABLO) to discriminate between glioma types. It was
possible to find a set of highly correlated features distinguishing glioblastoma from low-
grade gliomas (LGG) that were mainly associated with the disruption of receptor tyrosine
kinases signaling pathways and extracellular matrix organization and remodeling. On the
other hand, the discrimination of the LGG types was characterized primarily by features
involved in ubiquitination and DNA transcription processes. Furthermore, several novel
glioma biomarkers likely helpful in both diagnosis and prognosis of the patients were
identified, including the genes PPP1R8, GPBP1L1, KIAA1614, C14orf23, CCDC77, BVES,
EXD3, CD300A and HEPN1. Overall, this classification method allowed to discriminate the
different TCGA glioma patients with very high performance, while seeking for common
information across multiple data types, ultimately enabling the understanding of essential
mechanisms driving glioma heterogeneity and unveiling potential therapeutic targets.O glioma é atualmente um dos tipos mais prevalentes de cancro cerebral primário. Dado
o seu elevado nível de heterogeneidade e dada a complexidade dos seus marcadores
moleculares biológicos, muitos esforços têm sido realizados para classificar com precisão
o tipo de glioma em cada paciente, o que, por sua vez, é fundamental para melhorar o
diagnóstico precoce e aumentar a sobrevivência. No entanto, como resultado dos avanços
tecnológicos em rápido crescimento na sequenciação de dados e na evolução da com-
preensão molecular da biologia do glioma, a sua classificação foi recentemente sujeita
a alterações significativas. Neste estudo, múltiplas modalidades ómicas de glioma (in-
cluindo mRNA, metilação de DNA e miRNA) provenientes do The Cancer Genome Atlas
(TCGA) são integradas, juntamente com a utilização das classes revistas e reclassificadas
de glioma, com um método supervisionado baseado em análise de correlação canónica
esparsa (DIABLO) para discriminar entre os tipos de glioma. Foi possível encontrar um
conjunto de características altamente correlacionadas que distinguem o glioblastoma
dos gliomas de baixo grau (LGG) que estavam principalmente associadas à ruptura das
vias de sinalização dos receptores de tirosina quinases e à organização e remodelação
da matriz extracelular. Por outro lado, a discriminação dos tipos LGG foi caracterizada
principalmente por variáveis envolvidas nos processos de ubiquitinação e transcrição de
DNA. Além disso, foram identificados vários novos biomarcadores de glioma potencial-
mente úteis tanto no diagnóstico quanto no prognóstico dos pacientes, incluindo os genes
PPP1R8, GPBP1L1, KIAA1614, C14orf23, CCDC77, BVES, EXD3, CD300A e HEPN1. No
geral, este método de classificação permitiu discriminar com desempenho muito elevado
os diferentes pacientes com glioma, simultaneamente procurando informações comuns
entre os vários tipos de dados, permitindo, em última análise, a compreensão de mecanis-
mos essenciais que impulsionam a heterogeneidade em glioma e revelam potenciais alvos
terapêuticos
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Supervised Methods for Biomarker Detection from Microarray Experiments
Biomarkers are valuable indicators of the state of a biological system. Microarray technology has been extensively used to identify biomarkers and build computational predictive models for disease prognosis, drug sensitivity and toxicity evaluations. Activation biomarkers can be used to understand the underlying signaling cascades, mechanisms of action and biological cross talk. Biomarker detection from microarray data requires several considerations both from the biological and computational points of view. In this chapter, we describe the main methodology used in biomarkers discovery and predictive modeling and we address some of the related challenges. Moreover, we discuss biomarker validation and give some insights into multiomics strategies for biomarker detection.Non peer reviewe
Integration and visualisation of clinical-omics datasets for medical knowledge discovery
In recent decades, the rise of various omics fields has flooded life sciences with unprecedented amounts of high-throughput data, which have transformed the way biomedical research is conducted. This trend will only intensify in the coming decades, as the cost of data acquisition will continue to decrease. Therefore, there is a pressing need to find novel ways to turn this ocean of raw data into waves of information and finally distil those into drops of translational medical knowledge. This is particularly challenging because of the incredible richness of these datasets, the humbling complexity of biological systems and the growing abundance of clinical metadata, which makes the integration of disparate data sources even more difficult.
Data integration has proven to be a promising avenue for knowledge discovery in biomedical research. Multi-omics studies allow us to examine a biological problem through different lenses using more than one analytical platform. These studies not only present tremendous opportunities for the deep and systematic understanding of health and disease, but they also pose new statistical and computational challenges. The work presented in this thesis aims to alleviate this problem with a novel pipeline for omics data integration.
Modern omics datasets are extremely feature rich and in multi-omics studies this complexity is compounded by a second or even third dataset. However, many of these features might be completely irrelevant to the studied biological problem or redundant in the context of others. Therefore, in this thesis, clinical metadata driven feature selection is proposed as a viable option for narrowing down the focus of analyses in biomedical research.
Our visual cortex has been fine-tuned through millions of years to become an outstanding pattern recognition machine. To leverage this incredible resource of the human brain, we need to develop advanced visualisation software that enables researchers to explore these vast biological datasets through illuminating charts and interactivity. Accordingly, a substantial portion of this PhD was dedicated to implementing truly novel visualisation methods for multi-omics studies.Open Acces
Analysis tools for the interplay between genome layout and regulation
Genome layout and gene regulation appear to be interdependent. Understanding this interdependence is key to exploring the dynamic nature of chromosome conformation and to engineering functional genomes. Evidence for non-random genome layout, defined as the relative positioning of either co-functional or co-regulated genes, stems from two main approaches. Firstly, the analysis of contiguous genome segments across species, has highlighted the conservation of gene arrangement (synteny) along chromosomal regions. Secondly, the study of long-range interactions along a chromosome has emphasised regularities in the positioning of microbial genes that are co-regulated, co-expressed or evolutionarily correlated. While one-dimensional pattern analysis is a mature field, it is often powerless on biological datasets which tend to be incomplete, and partly incorrect. Moreover, there is a lack of comprehensive, user-friendly tools to systematically analyse, visualise, integrate and exploit regularities along genomes.Here we present the Genome REgulatory and Architecture Tools SCAN (GREAT:SCAN) software for the systematic study of the interplay between genome layout and gene expression regulation.SCAN is a collection of related and interconnected applications currently able to perform systematic analyses of genome regularities as well as to improve transcription factor binding sites (TFBS) and gene regulatory network predictions based on gene positional information.We demonstrate the capabilities of these tools by studying on one hand the regular patterns of genome layout in the major regulons of the bacterium Escherichia coli. On the other hand, we demonstrate the capabilities to improve TFBS prediction in microbes. Finally, we highlight, by visualisation of multivariate techniques, the interplay between position and sequence information for effective transcription regulation
- …