14 research outputs found
INTEGRATIVE APPROACHES FOR THE STUDY OF COMPLEX HUMAN DISEASE
The opportunities that lead to the detection, treatment and prevention of diseases oftentimes require a systemic understanding of what cellular changes accompany the disease. High- throughput approaches such as the microarray and RNA-sequencing have empowered researchers to study the behavior of thousands of genes within a cell. The integration of this data across pathological states and multiple experiments presents many opportunities to improve our understanding of human diseases. This thesis represents the work of two projects focused on integrating high throughput data to identify genes associated with a disease. The first project seeks to understand the changes in expression that occur during oncogenesis. By integrating gene expression data across three histological mammary tissue states (normal, adenoma, and carcinoma) we have identified three distinct patterns of gene expression that emerge during the progression of a tumor. We show that these disease-progression associated genes represent known cancer-related pathways. The second project utilizes NaĂŻve Bayesian machine learning to predict novel immune functional relationships by distilling the data from a large compendium of high-throughput gene expression data. We built an interactive web resource for exploring the relationships within these generated networks. Furthermore, we utilized these networks to successfully predict genes associated with several immune diseases
Recommended from our members
Tissue-Specific Functional Networks for Prioritizing Phenotype and Disease Genes
<div><p>Integrated analyses of functional genomics data have enormous potential for identifying phenotype-associated genes. Tissue-specificity is an important aspect of many genetic diseases, reflecting the potentially different roles of proteins and pathways in diverse cell lineages. Accounting for tissue specificity in global integration of functional genomics data is challenging, as “functionality” and “functional relationships” are often not resolved for specific tissue types. We address this challenge by generating tissue-specific functional networks, which can effectively represent the diversity of protein function for more accurate identification of phenotype-associated genes in the laboratory mouse. Specifically, we created 107 tissue-specific functional relationship networks through integration of genomic data utilizing knowledge of tissue-specific gene expression patterns. Cross-network comparison revealed significantly changed genes enriched for functions related to specific tissue development. We then utilized these tissue-specific networks to predict genes associated with different phenotypes. Our results demonstrate that prediction performance is significantly improved through using the tissue-specific networks as compared to the global functional network. We used a testis-specific functional relationship network to predict genes associated with male fertility and spermatogenesis phenotypes, and experimentally confirmed one top prediction, <em>Mbyl1</em>. We then focused on a less-common genetic disease, ataxia, and identified candidates uniquely predicted by the cerebellum network, which are supported by both literature and experimental evidence. Our systems-level, tissue-specific scheme advances over traditional global integration and analyses and establishes a prototype to address the tissue-specific effects of genetic perturbations, diseases and drugs.</p> </div
Example enriched Gene Ontology terms in the tissue MA:0000016 nervous system.
<p>Example enriched Gene Ontology terms in the tissue MA:0000016 nervous system.</p
Prediction and verification of infertility-related genes through male reproductive system-specific networks.
<p><b>A.</b> Local functional relationship network of the gene <i>Mybl1</i> in the male reproductive system. The top 18 genes connected to the query set with connection weights higher than 0.634 are displayed. These top functionally related proteins include well characterized male infertility genes such as <i>Dmc1</i>, <i>Ddx4</i>, and <i>Cyct</i>. <b>B.</b> Histological cross-sections of oval seminiferous tubules show that wild type (<i>Mybl1<sup>+/+</sup></i>) testis tubules contain many developing germ cells, while mutant (<i>Mybl1<sup>repro9/repro9</sup></i>) testis tubules contain many fewer germ cells and more empty space, indicative of infertility.</p
Strategy for constructing tissue-specific networks and predicting phenotype-associated genes.
<p>Diverse functional genomic datasets such as expression, protein-protein interactions and phenotype information were integrated in a Bayesian framework to generate tissue-specific networks. Input datasets were probabilistically “weighted” based on how informative they were in reflecting known co-functional proteins that are both expressed in a given tissue. To account for overlap in information in multiple datasets (especially the large number of gene expression microarray datasets), mutual information-based regularization was used to down-weight datasets showing significant overlap with each other. These networks were then used as input into a Support Vector Machine classifier to predict phenotype related genes. Finally, we implemented a web interface that allows network comparison between tissues.</p
Tissue-specific networks are more accurate than the global network in reflecting protein functional relationships.
<p><b>A.</b> 107 tissues were grouped into major body systems according to the anatomical hierarchical structure maintained in GXD <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002694#pcbi.1002694-Smith1" target="_blank">[20]</a>. Through three-fold cross-validation, the performance of tissue-specific networks was compared against the global network and the percentage improvement of tissue-specific networks over the global network was plotted. All tissue-specific networks out-performed the global network in this cross-validation analysis. Improvements were consistent across tissues belonging to all major organ systems. Candle-stick plots (minimum, 25%, median, 75% and maximum) represent the distribution of percentage AUC improvement for all tissues in a specific system. <b>B.</b> Example precision recall curves of tissue-specific and the global network, generated using three-fold cross-validation. Across the entire precision-recall space, tissue-specific networks performed better than the global network. Complete precision-recall figures for all networks are included in <b><a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002694#pcbi.1002694.s002" target="_blank">Dataset S2</a></b>.</p
Top connected genes to <i>Atcay</i> in the cerebellum-specific network reveals likely ataxia candidates.
<p>Edges with weight greater than 0.9 are shown. In the cerebellum network (<b>A</b>), <i>Grm1</i> and <i>Cacn1a</i> are the top predicted connections to <i>Atcay</i>, with confidences of 0.902 and 0.943, respectively. Both genes are closely connected to <i>Atcay</i> and its top 10 neighbors. In the global network (<b>B</b>), <i>Grm1</i> and <i>Cacn1a</i> are much more weakly connected to <i>Atcay</i> (0.763 and 0.647, respectively), and are not identified as top connectors to <i>Atcay</i>. <i>Grm1</i> and Cacn1a are not connected to <i>Atcay</i> or any of its top 10 neighbors in the global network.</p
Tissue-specific networks perform better than the global network in predicting genes related to different phenotypes.
<p>By mapping phenotypes to different tissues according to their terminology and description, we are able to compare the performance of tissue-specific networks and the global network in predicting phenotype-related genes. Candle-stick plots (minimum, 25%, median, 75% and maximum) show the distribution of percentage AUC improvement when predicting phenotype-related genes. <b>A.</b> Phenotypes were grouped according to the number of annotated genes. Tissue-specific functional networks show consistent improvement across different phenotype sizes. <b>B.</b> Phenotypes were grouped according to major organ systems of their corresponding tissue. Improvements were consistent across all major systems. <b>C.</b> Example precision-recall curves for “abnormal osteogenesis” (MP:0000057), “abnormal nervous system electrophysiology” (MP:0002272), “abnormal spleen white pulp morphology” (MP:0002357), and “abnormal CNS glial cell morphology” (MP:0003634) using both tissue-specific networks (shown in red) and global networks (shown in green). For phenotypes such as these, tissue-specific networks are necessary to make accurate predictions.</p
Recommended from our members
Modeling molecular development of breast cancer in canine mammary tumors
Understanding the changes in diverse molecular pathways underlying the development of breast tumors is critical for improving diagnosis, treatment, and drug development. Here, we used RNA-profiling of canine mammary tumors (CMTs) coupled with a robust analysis framework to model molecular changes in human breast cancer. Our study leveraged a key advantage of the canine model, the frequent presence of multiple naturally occurring tumors at diagnosis, thus providing samples spanning normal tissue and benign and malignant tumors from each patient. We showed human breast cancer signals, at both expression and mutation level, are evident in CMTs. Profiling multiple tumors per patient enabled by the CMT model allowed us to resolve statistically robust transcription patterns and biological pathways specific to malignant tumors versus those arising in benign tumors or shared with normal tissues. We showed that multiple histological samples per patient is necessary to effectively capture these progression-related signatures, and that carcinoma-specific signatures are predictive of survival for human breast cancer patients. To catalyze and support similar analyses and use of the CMT model by other biomedical researchers, we provide FREYA, a robust data processing pipeline and statistical analyses framework
Evidence for top 10 predictions for ataxia-causing genes using mouse cerebellum-specific networks.
<p>Evidence for top 10 predictions for ataxia-causing genes using mouse cerebellum-specific networks.</p