14 research outputs found

    INTEGRATIVE APPROACHES FOR THE STUDY OF COMPLEX HUMAN DISEASE

    No full text
    The opportunities that lead to the detection, treatment and prevention of diseases oftentimes require a systemic understanding of what cellular changes accompany the disease. High- throughput approaches such as the microarray and RNA-sequencing have empowered researchers to study the behavior of thousands of genes within a cell. The integration of this data across pathological states and multiple experiments presents many opportunities to improve our understanding of human diseases. This thesis represents the work of two projects focused on integrating high throughput data to identify genes associated with a disease. The first project seeks to understand the changes in expression that occur during oncogenesis. By integrating gene expression data across three histological mammary tissue states (normal, adenoma, and carcinoma) we have identified three distinct patterns of gene expression that emerge during the progression of a tumor. We show that these disease-progression associated genes represent known cancer-related pathways. The second project utilizes NaĂŻve Bayesian machine learning to predict novel immune functional relationships by distilling the data from a large compendium of high-throughput gene expression data. We built an interactive web resource for exploring the relationships within these generated networks. Furthermore, we utilized these networks to successfully predict genes associated with several immune diseases

    Prediction and verification of infertility-related genes through male reproductive system-specific networks.

    No full text
    <p><b>A.</b> Local functional relationship network of the gene <i>Mybl1</i> in the male reproductive system. The top 18 genes connected to the query set with connection weights higher than 0.634 are displayed. These top functionally related proteins include well characterized male infertility genes such as <i>Dmc1</i>, <i>Ddx4</i>, and <i>Cyct</i>. <b>B.</b> Histological cross-sections of oval seminiferous tubules show that wild type (<i>Mybl1<sup>+/+</sup></i>) testis tubules contain many developing germ cells, while mutant (<i>Mybl1<sup>repro9/repro9</sup></i>) testis tubules contain many fewer germ cells and more empty space, indicative of infertility.</p

    Strategy for constructing tissue-specific networks and predicting phenotype-associated genes.

    No full text
    <p>Diverse functional genomic datasets such as expression, protein-protein interactions and phenotype information were integrated in a Bayesian framework to generate tissue-specific networks. Input datasets were probabilistically “weighted” based on how informative they were in reflecting known co-functional proteins that are both expressed in a given tissue. To account for overlap in information in multiple datasets (especially the large number of gene expression microarray datasets), mutual information-based regularization was used to down-weight datasets showing significant overlap with each other. These networks were then used as input into a Support Vector Machine classifier to predict phenotype related genes. Finally, we implemented a web interface that allows network comparison between tissues.</p

    Tissue-specific networks are more accurate than the global network in reflecting protein functional relationships.

    No full text
    <p><b>A.</b> 107 tissues were grouped into major body systems according to the anatomical hierarchical structure maintained in GXD <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002694#pcbi.1002694-Smith1" target="_blank">[20]</a>. Through three-fold cross-validation, the performance of tissue-specific networks was compared against the global network and the percentage improvement of tissue-specific networks over the global network was plotted. All tissue-specific networks out-performed the global network in this cross-validation analysis. Improvements were consistent across tissues belonging to all major organ systems. Candle-stick plots (minimum, 25%, median, 75% and maximum) represent the distribution of percentage AUC improvement for all tissues in a specific system. <b>B.</b> Example precision recall curves of tissue-specific and the global network, generated using three-fold cross-validation. Across the entire precision-recall space, tissue-specific networks performed better than the global network. Complete precision-recall figures for all networks are included in <b><a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002694#pcbi.1002694.s002" target="_blank">Dataset S2</a></b>.</p

    Top connected genes to <i>Atcay</i> in the cerebellum-specific network reveals likely ataxia candidates.

    No full text
    <p>Edges with weight greater than 0.9 are shown. In the cerebellum network (<b>A</b>), <i>Grm1</i> and <i>Cacn1a</i> are the top predicted connections to <i>Atcay</i>, with confidences of 0.902 and 0.943, respectively. Both genes are closely connected to <i>Atcay</i> and its top 10 neighbors. In the global network (<b>B</b>), <i>Grm1</i> and <i>Cacn1a</i> are much more weakly connected to <i>Atcay</i> (0.763 and 0.647, respectively), and are not identified as top connectors to <i>Atcay</i>. <i>Grm1</i> and Cacn1a are not connected to <i>Atcay</i> or any of its top 10 neighbors in the global network.</p

    Tissue-specific networks perform better than the global network in predicting genes related to different phenotypes.

    No full text
    <p>By mapping phenotypes to different tissues according to their terminology and description, we are able to compare the performance of tissue-specific networks and the global network in predicting phenotype-related genes. Candle-stick plots (minimum, 25%, median, 75% and maximum) show the distribution of percentage AUC improvement when predicting phenotype-related genes. <b>A.</b> Phenotypes were grouped according to the number of annotated genes. Tissue-specific functional networks show consistent improvement across different phenotype sizes. <b>B.</b> Phenotypes were grouped according to major organ systems of their corresponding tissue. Improvements were consistent across all major systems. <b>C.</b> Example precision-recall curves for “abnormal osteogenesis” (MP:0000057), “abnormal nervous system electrophysiology” (MP:0002272), “abnormal spleen white pulp morphology” (MP:0002357), and “abnormal CNS glial cell morphology” (MP:0003634) using both tissue-specific networks (shown in red) and global networks (shown in green). For phenotypes such as these, tissue-specific networks are necessary to make accurate predictions.</p
    corecore