4 research outputs found
TIME AND CAUSALITY IN GENOMICS DATA
The ability to sequence the genomic information that describes individual cell states has provided enormous insight into biological systems. However, to sequence the genomic information within a cell, the cell must be killed, preventing measurements from the future states that cell would have occupied had it been allowed to survive. Thus, sequencing measurements only provide a single snapshot in time of cellular genomic states. Often the ultimate goal of an analysis is to derive mechanistic insight into the biology of a system or process from the data. However, such mechanistic, causal inference is almost impossible without temporal information because causality in standard formulations is based on the concept of connected causes and effects through time.
This thesis has interacted with time in genomics data in several ways. The first contribution of this thesis is a neural network-based model that attempts to predict future single-cell transcriptomic states from single-cell transcriptomics data sets. This work demonstrates that using metabolic labeling data sets, future RNA states are estimable within the same cell in the short term, providing a proof of principle that can be expanded as genomics data sets with a temporal dimension become more common.
The second contribution of this thesis is a simulation of molecular cell states over time, which is able to demonstrate how single time points from cells do not allow for robust mechanistic inference. Further, the simulation conforms to observations that mRNA expression and expression of the corresponding protein are often poorly correlated and provides mechanistic explanations for how this occurs.
The final contribution relates to time in a different sense, analyzing the impact of human age on biomarkers used for cancer immunotherapy. We found that older individuals possessed a number of favorable biomarkers at higher levels than their younger counterparts, possibly explaining clinical observations that older individuals do no worse than younger individuals on immune checkpoint therapies despite the usual anticorrelation between patient age and effective immune responses
TIME AND CAUSALITY IN GENOMICS DATA
The ability to sequence the genomic information that describes individual cell states has provided enormous insight into biological systems. However, to sequence the genomic information within a cell, the cell must be killed, preventing measurements from the future states that cell would have occupied had it been allowed to survive. Thus, sequencing measurements only provide a single snapshot in time of cellular genomic states. Often the ultimate goal of an analysis is to derive mechanistic insight into the biology of a system or process from the data. However, such mechanistic, causal inference is almost impossible without temporal information because causality in standard formulations is based on the concept of connected causes and effects through time.
This thesis has interacted with time in genomics data in several ways. The first contribution of this thesis is a neural network-based model that attempts to predict future single-cell transcriptomic states from single-cell transcriptomics data sets. This work demonstrates that using metabolic labeling data sets, future RNA states are estimable within the same cell in the short term, providing a proof of principle that can be expanded as genomics data sets with a temporal dimension become more common.
The second contribution of this thesis is a simulation of molecular cell states over time, which is able to demonstrate how single time points from cells do not allow for robust mechanistic inference. Further, the simulation conforms to observations that mRNA expression and expression of the corresponding protein are often poorly correlated and provides mechanistic explanations for how this occurs.
The final contribution relates to time in a different sense, analyzing the impact of human age on biomarkers used for cancer immunotherapy. We found that older individuals possessed a number of favorable biomarkers at higher levels than their younger counterparts, possibly explaining clinical observations that older individuals do no worse than younger individuals on immune checkpoint therapies despite the usual anticorrelation between patient age and effective immune responses
Chromatin structure regulates cancer-specific alternative splicing events in primary HPV-related oropharyngeal squamous cell carcinoma
Human papillomavirus-related oropharyngeal squamous cell carcinoma (HPV+ OPSCC) represents a unique disease entity within head and neck cancer with rising incidence. Previous work has shown that alternative splicing events (ASEs) are prevalent in HPV+ OPSCC, but further validation is needed to understand the regulation of this process and its role in these tumours. In this study, eleven ASEs (GIT2, CTNNB1, MKNK2, MRPL33, SIPA1L3, SNHG6, SYCP2, TPRG1, ZHX2, ZNF331, and ELOVL1) were selected for validation from 109 previously published candidate ASEs to elucidate the post-transcriptional mechanisms of oncogenesis in HPV+ disease. In vitro qRT-PCR confirmed differential expression of 9 of 11 ASE candidates, and in silico analysis within the TCGA cohort confirmed 8 of 11 candidates. Six ASEs (MRPL33, SIPA1L3, SNHG6, TPRG1, ZHX2, and ELOVL1) showed significant differential expression across both methods. Further evaluation of chromatin modification revealed that ASEs strongly correlated with cancer-specific distribution of acetylated lysine 27 of histone 3 (H3K27ac). Subsequent epigenetic treatment of HPV+ HNSCC cell lines (UM-SCC-047 and UPCI-SCC-090) with JQ1 not only induced downregulation of cancer-specific ASE isoforms, but also growth inhibition in both cell lines. The UPCI-SCC-090 cell line, with greater ASE expression, also showed more significant growth inhibition after JQ1 treatment. This study confirms several novel cancer-specific ASEs in HPV+OPSCC and provides evidence for the role of chromatin modifications in regulation of alternative splicing in HPV+OPSCC. This highlights the role of epigenetic changes in the oncogenesis of HPV+OPSCC, which represents a unique, unexplored target for therapeutics that can alter the global post-transcriptional landscape
Joint Analysis of Psychiatric Disorders Increases Accuracy of Risk Prediction for Schizophrenia, Bipolar Disorder, and Major Depressive Disorder
Genetic risk prediction has several potential applications in medical research and clinical practice and could be used, for example, to stratify a heterogeneous population of patients by their predicted genetic risk. However, for polygenic traits, such as psychiatric disorders, the accuracy of risk prediction is low. Here we use a multivariate linear mixed model and apply multi-trait genomic best linear unbiased prediction for genetic risk prediction. This method exploits correlations between disorders and simultaneously evaluates individual risk for each disorder. We show that the multivariate approach significantly increases the prediction accuracy for schizophrenia, bipolar disorder, and major depressive disorder in the discovery as well as in independent validation datasets. By grouping SNPs based on genome annotation and fitting multiple random effects, we show that the prediction accuracy could be further improved. The gain in prediction accuracy of the multivariate approach is equivalent to an increase in sample size of 34% for schizophrenia, 68% for bipolar disorder, and 76% for major depressive disorders using single trait models. Because our approach can be readily applied to any number of GWAS datasets of correlated traits, it is a flexible and powerful tool to maximize prediction accuracy. With current sample size, risk predictors are not useful in a clinical setting but already are a valuable research tool, for example in experimental designs comparing cases with high and low polygenic risk