944 research outputs found

    A primer on correlation-based dimension reduction methods for multi-omics analysis

    Full text link
    The continuing advances of omic technologies mean that it is now more tangible to measure the numerous features collectively reflecting the molecular properties of a sample. When multiple omic methods are used, statistical and computational approaches can exploit these large, connected profiles. Multi-omics is the integration of different omic data sources from the same biological sample. In this review, we focus on correlation-based dimension reduction approaches for single omic datasets, followed by methods for pairs of omics datasets, before detailing further techniques for three or more omic datasets. We also briefly detail network methods when three or more omic datasets are available and which complement correlation-oriented tools. To aid readers new to this area, these are all linked to relevant R packages that can implement these procedures. Finally, we discuss scenarios of experimental design and present road maps that simplify the selection of appropriate analysis methods. This review will guide researchers navigate the emerging methods for multi-omics and help them integrate diverse omic datasets appropriately and embrace the opportunity of population multi-omics.Comment: 30 pages, 2 figures, 6 table

    Unveiling Novel Glioma Biomarkers through Multi-omics Integration and Classification

    Get PDF
    Glioma is currently one of the most prevalent types of primary brain cancer. Given its high level of heterogeneity along with the complex biological molecular markers, many efforts have been made to accurately classify the type of glioma in each patient, which, in turn, is critical to improve early diagnosis and increase survival. Nonetheless, as a result of the fast- growing technological advances in high throughput sequencing and evolving molecular understanding of glioma biology, its classification has been recently subject to significant alterations. In this study, multiple glioma omics modalities (including mRNA, DNA methylation, and miRNA) from The Cancer Genome Atlas (TCGA) are integrated, while using the revised glioma reclassified labels, with a supervised method based on sparse canonical correlation analysis (DIABLO) to discriminate between glioma types. It was possible to find a set of highly correlated features distinguishing glioblastoma from low- grade gliomas (LGG) that were mainly associated with the disruption of receptor tyrosine kinases signaling pathways and extracellular matrix organization and remodeling. On the other hand, the discrimination of the LGG types was characterized primarily by features involved in ubiquitination and DNA transcription processes. Furthermore, several novel glioma biomarkers likely helpful in both diagnosis and prognosis of the patients were identified, including the genes PPP1R8, GPBP1L1, KIAA1614, C14orf23, CCDC77, BVES, EXD3, CD300A and HEPN1. Overall, this classification method allowed to discriminate the different TCGA glioma patients with very high performance, while seeking for common information across multiple data types, ultimately enabling the understanding of essential mechanisms driving glioma heterogeneity and unveiling potential therapeutic targets.O glioma é atualmente um dos tipos mais prevalentes de cancro cerebral primário. Dado o seu elevado nível de heterogeneidade e dada a complexidade dos seus marcadores moleculares biológicos, muitos esforços têm sido realizados para classificar com precisão o tipo de glioma em cada paciente, o que, por sua vez, é fundamental para melhorar o diagnóstico precoce e aumentar a sobrevivência. No entanto, como resultado dos avanços tecnológicos em rápido crescimento na sequenciação de dados e na evolução da com- preensão molecular da biologia do glioma, a sua classificação foi recentemente sujeita a alterações significativas. Neste estudo, múltiplas modalidades ómicas de glioma (in- cluindo mRNA, metilação de DNA e miRNA) provenientes do The Cancer Genome Atlas (TCGA) são integradas, juntamente com a utilização das classes revistas e reclassificadas de glioma, com um método supervisionado baseado em análise de correlação canónica esparsa (DIABLO) para discriminar entre os tipos de glioma. Foi possível encontrar um conjunto de características altamente correlacionadas que distinguem o glioblastoma dos gliomas de baixo grau (LGG) que estavam principalmente associadas à ruptura das vias de sinalização dos receptores de tirosina quinases e à organização e remodelação da matriz extracelular. Por outro lado, a discriminação dos tipos LGG foi caracterizada principalmente por variáveis envolvidas nos processos de ubiquitinação e transcrição de DNA. Além disso, foram identificados vários novos biomarcadores de glioma potencial- mente úteis tanto no diagnóstico quanto no prognóstico dos pacientes, incluindo os genes PPP1R8, GPBP1L1, KIAA1614, C14orf23, CCDC77, BVES, EXD3, CD300A e HEPN1. No geral, este método de classificação permitiu discriminar com desempenho muito elevado os diferentes pacientes com glioma, simultaneamente procurando informações comuns entre os vários tipos de dados, permitindo, em última análise, a compreensão de mecanis- mos essenciais que impulsionam a heterogeneidade em glioma e revelam potenciais alvos terapêuticos

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Supervised Methods for Biomarker Detection from Microarray Experiments

    Get PDF
    Biomarkers are valuable indicators of the state of a biological system. Microarray technology has been extensively used to identify biomarkers and build computational predictive models for disease prognosis, drug sensitivity and toxicity evaluations. Activation biomarkers can be used to understand the underlying signaling cascades, mechanisms of action and biological cross talk. Biomarker detection from microarray data requires several considerations both from the biological and computational points of view. In this chapter, we describe the main methodology used in biomarkers discovery and predictive modeling and we address some of the related challenges. Moreover, we discuss biomarker validation and give some insights into multiomics strategies for biomarker detection.Non peer reviewe

    Integration and visualisation of clinical-omics datasets for medical knowledge discovery

    Get PDF
    In recent decades, the rise of various omics fields has flooded life sciences with unprecedented amounts of high-throughput data, which have transformed the way biomedical research is conducted. This trend will only intensify in the coming decades, as the cost of data acquisition will continue to decrease. Therefore, there is a pressing need to find novel ways to turn this ocean of raw data into waves of information and finally distil those into drops of translational medical knowledge. This is particularly challenging because of the incredible richness of these datasets, the humbling complexity of biological systems and the growing abundance of clinical metadata, which makes the integration of disparate data sources even more difficult. Data integration has proven to be a promising avenue for knowledge discovery in biomedical research. Multi-omics studies allow us to examine a biological problem through different lenses using more than one analytical platform. These studies not only present tremendous opportunities for the deep and systematic understanding of health and disease, but they also pose new statistical and computational challenges. The work presented in this thesis aims to alleviate this problem with a novel pipeline for omics data integration. Modern omics datasets are extremely feature rich and in multi-omics studies this complexity is compounded by a second or even third dataset. However, many of these features might be completely irrelevant to the studied biological problem or redundant in the context of others. Therefore, in this thesis, clinical metadata driven feature selection is proposed as a viable option for narrowing down the focus of analyses in biomedical research. Our visual cortex has been fine-tuned through millions of years to become an outstanding pattern recognition machine. To leverage this incredible resource of the human brain, we need to develop advanced visualisation software that enables researchers to explore these vast biological datasets through illuminating charts and interactivity. Accordingly, a substantial portion of this PhD was dedicated to implementing truly novel visualisation methods for multi-omics studies.Open Acces

    Analysis tools for the interplay between genome layout and regulation

    Get PDF
    Genome layout and gene regulation appear to be interdependent. Understanding this interdependence is key to exploring the dynamic nature of chromosome conformation and to engineering functional genomes. Evidence for non-random genome layout, defined as the relative positioning of either co-functional or co-regulated genes, stems from two main approaches. Firstly, the analysis of contiguous genome segments across species, has highlighted the conservation of gene arrangement (synteny) along chromosomal regions. Secondly, the study of long-range interactions along a chromosome has emphasised regularities in the positioning of microbial genes that are co-regulated, co-expressed or evolutionarily correlated. While one-dimensional pattern analysis is a mature field, it is often powerless on biological datasets which tend to be incomplete, and partly incorrect. Moreover, there is a lack of comprehensive, user-friendly tools to systematically analyse, visualise, integrate and exploit regularities along genomes.Here we present the Genome REgulatory and Architecture Tools SCAN (GREAT:SCAN) software for the systematic study of the interplay between genome layout and gene expression regulation.SCAN is a collection of related and interconnected applications currently able to perform systematic analyses of genome regularities as well as to improve transcription factor binding sites (TFBS) and gene regulatory network predictions based on gene positional information.We demonstrate the capabilities of these tools by studying on one hand the regular patterns of genome layout in the major regulons of the bacterium Escherichia coli. On the other hand, we demonstrate the capabilities to improve TFBS prediction in microbes. Finally, we highlight, by visualisation of multivariate techniques, the interplay between position and sequence information for effective transcription regulation
    corecore