4 research outputs found

    Unveiling Novel Glioma Biomarkers through Multi-omics Integration and Classification

    Get PDF
    Glioma is currently one of the most prevalent types of primary brain cancer. Given its high level of heterogeneity along with the complex biological molecular markers, many efforts have been made to accurately classify the type of glioma in each patient, which, in turn, is critical to improve early diagnosis and increase survival. Nonetheless, as a result of the fast- growing technological advances in high throughput sequencing and evolving molecular understanding of glioma biology, its classification has been recently subject to significant alterations. In this study, multiple glioma omics modalities (including mRNA, DNA methylation, and miRNA) from The Cancer Genome Atlas (TCGA) are integrated, while using the revised glioma reclassified labels, with a supervised method based on sparse canonical correlation analysis (DIABLO) to discriminate between glioma types. It was possible to find a set of highly correlated features distinguishing glioblastoma from low- grade gliomas (LGG) that were mainly associated with the disruption of receptor tyrosine kinases signaling pathways and extracellular matrix organization and remodeling. On the other hand, the discrimination of the LGG types was characterized primarily by features involved in ubiquitination and DNA transcription processes. Furthermore, several novel glioma biomarkers likely helpful in both diagnosis and prognosis of the patients were identified, including the genes PPP1R8, GPBP1L1, KIAA1614, C14orf23, CCDC77, BVES, EXD3, CD300A and HEPN1. Overall, this classification method allowed to discriminate the different TCGA glioma patients with very high performance, while seeking for common information across multiple data types, ultimately enabling the understanding of essential mechanisms driving glioma heterogeneity and unveiling potential therapeutic targets.O glioma é atualmente um dos tipos mais prevalentes de cancro cerebral primário. Dado o seu elevado nível de heterogeneidade e dada a complexidade dos seus marcadores moleculares biológicos, muitos esforços têm sido realizados para classificar com precisão o tipo de glioma em cada paciente, o que, por sua vez, é fundamental para melhorar o diagnóstico precoce e aumentar a sobrevivência. No entanto, como resultado dos avanços tecnológicos em rápido crescimento na sequenciação de dados e na evolução da com- preensão molecular da biologia do glioma, a sua classificação foi recentemente sujeita a alterações significativas. Neste estudo, múltiplas modalidades ómicas de glioma (in- cluindo mRNA, metilação de DNA e miRNA) provenientes do The Cancer Genome Atlas (TCGA) são integradas, juntamente com a utilização das classes revistas e reclassificadas de glioma, com um método supervisionado baseado em análise de correlação canónica esparsa (DIABLO) para discriminar entre os tipos de glioma. Foi possível encontrar um conjunto de características altamente correlacionadas que distinguem o glioblastoma dos gliomas de baixo grau (LGG) que estavam principalmente associadas à ruptura das vias de sinalização dos receptores de tirosina quinases e à organização e remodelação da matriz extracelular. Por outro lado, a discriminação dos tipos LGG foi caracterizada principalmente por variáveis envolvidas nos processos de ubiquitinação e transcrição de DNA. Além disso, foram identificados vários novos biomarcadores de glioma potencial- mente úteis tanto no diagnóstico quanto no prognóstico dos pacientes, incluindo os genes PPP1R8, GPBP1L1, KIAA1614, C14orf23, CCDC77, BVES, EXD3, CD300A e HEPN1. No geral, este método de classificação permitiu discriminar com desempenho muito elevado os diferentes pacientes com glioma, simultaneamente procurando informações comuns entre os vários tipos de dados, permitindo, em última análise, a compreensão de mecanis- mos essenciais que impulsionam a heterogeneidade em glioma e revelam potenciais alvos terapêuticos

    Statistical Inference for Multi-view Data

    No full text
    Multi-view data, that is matched sets of measurements on the same subjects, have become increasingly common with technological advances in genomics, neuroscience and wearable technologies, etc. Despite its prevalence, traditional techniques for classification or association analysis cannot be applied to multi-view data since they do not take into account the heterogeneity between the views. In this dissertation, we focus on generalizing the existing high-dimensional methods to multi-view data. First, we propose a framework for the Joint Association and Classification Analysis of multi-view data (JACA). We support the methodology with theoretical guarantees for estimation consistency in high-dimensional settings, and numerical comparisons with existing methods. In addition, our approach is capable of using partial information where class labels or subsets of views are missing. Second, we investigate the Pan-Cancer data with a goal to assess the strength of association between different cellular composition estimations by exploring the Generalized Association Study framework. We extract the shared and individual signals from each view, and evaluate the relationship they have with the survival to find out the bio-markers that are predictive for cancer prognosis. Lastly, we propose a low-rank canonical correlation analysis framework to model heterogeneous data (both Gaussian and non-Gaussian) using exponential family distributions. We exploit a decomposition-based strategy to extract shared and individual structures from underlying natural parameter matrices. In contrast to existing methods, our approach guarantees that there is no shared information embedded in the individual structures. An alternating split orthogonal constraints algorithm is developed to estimate the model parameters, and simulation studies show the advantages of the proposed approach over other classical methods
    corecore