4 research outputs found
Unveiling Novel Glioma Biomarkers through Multi-omics Integration and Classification
Glioma is currently one of the most prevalent types of primary brain cancer. Given its high
level of heterogeneity along with the complex biological molecular markers, many efforts
have been made to accurately classify the type of glioma in each patient, which, in turn, is
critical to improve early diagnosis and increase survival. Nonetheless, as a result of the fast-
growing technological advances in high throughput sequencing and evolving molecular
understanding of glioma biology, its classification has been recently subject to significant
alterations. In this study, multiple glioma omics modalities (including mRNA, DNA
methylation, and miRNA) from The Cancer Genome Atlas (TCGA) are integrated, while
using the revised glioma reclassified labels, with a supervised method based on sparse
canonical correlation analysis (DIABLO) to discriminate between glioma types. It was
possible to find a set of highly correlated features distinguishing glioblastoma from low-
grade gliomas (LGG) that were mainly associated with the disruption of receptor tyrosine
kinases signaling pathways and extracellular matrix organization and remodeling. On the
other hand, the discrimination of the LGG types was characterized primarily by features
involved in ubiquitination and DNA transcription processes. Furthermore, several novel
glioma biomarkers likely helpful in both diagnosis and prognosis of the patients were
identified, including the genes PPP1R8, GPBP1L1, KIAA1614, C14orf23, CCDC77, BVES,
EXD3, CD300A and HEPN1. Overall, this classification method allowed to discriminate the
different TCGA glioma patients with very high performance, while seeking for common
information across multiple data types, ultimately enabling the understanding of essential
mechanisms driving glioma heterogeneity and unveiling potential therapeutic targets.O glioma é atualmente um dos tipos mais prevalentes de cancro cerebral primário. Dado
o seu elevado nível de heterogeneidade e dada a complexidade dos seus marcadores
moleculares biológicos, muitos esforços têm sido realizados para classificar com precisão
o tipo de glioma em cada paciente, o que, por sua vez, é fundamental para melhorar o
diagnóstico precoce e aumentar a sobrevivência. No entanto, como resultado dos avanços
tecnológicos em rápido crescimento na sequenciação de dados e na evolução da com-
preensão molecular da biologia do glioma, a sua classificação foi recentemente sujeita
a alterações significativas. Neste estudo, múltiplas modalidades ómicas de glioma (in-
cluindo mRNA, metilação de DNA e miRNA) provenientes do The Cancer Genome Atlas
(TCGA) são integradas, juntamente com a utilização das classes revistas e reclassificadas
de glioma, com um método supervisionado baseado em análise de correlação canónica
esparsa (DIABLO) para discriminar entre os tipos de glioma. Foi possível encontrar um
conjunto de características altamente correlacionadas que distinguem o glioblastoma
dos gliomas de baixo grau (LGG) que estavam principalmente associadas à ruptura das
vias de sinalização dos receptores de tirosina quinases e à organização e remodelação
da matriz extracelular. Por outro lado, a discriminação dos tipos LGG foi caracterizada
principalmente por variáveis envolvidas nos processos de ubiquitinação e transcrição de
DNA. Além disso, foram identificados vários novos biomarcadores de glioma potencial-
mente úteis tanto no diagnóstico quanto no prognóstico dos pacientes, incluindo os genes
PPP1R8, GPBP1L1, KIAA1614, C14orf23, CCDC77, BVES, EXD3, CD300A e HEPN1. No
geral, este método de classificação permitiu discriminar com desempenho muito elevado
os diferentes pacientes com glioma, simultaneamente procurando informações comuns
entre os vários tipos de dados, permitindo, em última análise, a compreensão de mecanis-
mos essenciais que impulsionam a heterogeneidade em glioma e revelam potenciais alvos
terapêuticos
Statistical Inference for Multi-view Data
Multi-view data, that is matched sets of measurements on the same subjects, have become
increasingly common with technological advances in genomics, neuroscience and wearable technologies, etc. Despite its prevalence, traditional techniques for classification or association analysis cannot be applied to multi-view data since they do not take into account the heterogeneity between the views. In this dissertation, we focus on generalizing the existing high-dimensional methods to multi-view data. First, we propose a framework for the Joint Association and Classification Analysis of multi-view data (JACA). We support the methodology with theoretical guarantees for estimation consistency in high-dimensional settings, and numerical comparisons with existing methods. In addition, our approach is capable of using partial information where class labels or subsets of views are missing. Second, we investigate the Pan-Cancer data with a goal to assess the strength of association between different cellular composition estimations by exploring the Generalized Association Study framework. We extract the shared and individual signals from each view, and evaluate the relationship they have with the survival to find out the bio-markers that are predictive for cancer prognosis. Lastly, we propose a low-rank canonical correlation analysis framework to model heterogeneous data (both Gaussian and non-Gaussian) using exponential family distributions. We exploit a decomposition-based strategy to extract shared and individual structures from underlying natural parameter matrices. In contrast to existing methods, our approach guarantees that there is no shared information embedded in the individual structures. An alternating split orthogonal constraints algorithm is developed to estimate the model parameters, and simulation studies show the advantages of the proposed approach over other classical methods