unknown

Characterizing and analyzing disease-related omics data using network modeling approaches

Abstract

Systems biology explores how the components that constitute a biological system interact with each other to produce biological phenotypes. A number of tools for comprehensive and high-throughput measurements of DNA/RNA, protein and metabolites have been developed. Each of these technologies helps to characterize individual components of the genome, proteome or metabolome and offers a distinct perspective about the system structure. My dissertation aims to characterize and analyze multiple types of omics data using existing and novel network-based approaches to better understand disease development mechanisms and improve disease diagnosis and prognosis. The transcriptome reflects the expression level of mRNAs in single cells or a population of cells. Understanding the transcriptome is an essential part of understanding organism development and disease. The first part of my thesis work focused on analyzing transcriptome data to characterize aggressiveness and heterogeneity of human astrocytoma, the most common glioma with a strikingly high mortality rate. A large-scale global gene expression analysis was performed to analyze gene expression profiles representing hundreds of samples generated by oligonucleotide microarrays. I employed a combination of gene- and network-based approaches to investigate the genetic and biological mechanisms implicated in observed phenotypic differences. I observed increasing dysregulation with increasing tumor grade and concluded that transcriptomic heterogeneity, observed at the population scale, is generally correlated with increasingly aggressive phenotypes. Heterogeneity in high-grade astrocytomas also manifests as differences in clinical outcomes and significant efforts had been devoted to identify subtypes within high-grade astrocytomas that have large differences in prognosis. I developed an automated network screening approach which could identify networks capable of predicting subtypes with differential survival in high-grade astrocytomas. The proteome represents the translated product of the mRNA, and proteomics measurement provides a direct estimate of protein abundance. For the second part of my Ph.D. research, I analyzed mouse brain protein measurements collected by the iTRAQ technology to query and identify dynamically perturbed modules in progressive mouse models of glioblastoma. Network behavior changes in early, middle and late stages of tumor development in genetically engineered mouse were tracked and 19 genes were selected for further confirmation of their roles in glioblastoma progression. In addition to this specific application to mouse glioblastoma data, the general pipeline represented a novel effort to isolate pathway-level responses to perturbations (e.g., brain tumor formation and progression) from large-scale proteomics data and could be applied in analyzing proteomics data from a variety of different contexts. The metabolome reflects biological information related to biochemical processes and metabolic networks involving metabolites. Metabolomics data can give an instantaneous snapshot of the current state of the cell and thus offers a distinct view of the effects of diet, drugs and disease on the model organism. The third part of my thesis is dedicated to building and refining genome-scale in silico metabolic models for mouse, in order to investigate how the metabolic model responds differently under different conditions (e.g., diabetic vs. normal). This project was completed in two stages: first, I examined the state-of-art genome-scale mouse metabolic model, identified its limitations, and then improved and refined its functionality; second, I created the first liver-specific metabolic models from the generic mouse models by pruning reactions that lack genetic evidence of presence, and then adding liver-specific reactions that represent the characteristics and functions of the mouse liver. Finally, I reconstructed two liver metabolic models for mouse, with one for the normal (control) strain and one for mouse diabetic strains. These two models were compared physiologically to infer metabolic genes that were most impacted by the onset of diabetes

    Similar works