112 research outputs found

    Pathologic gene network rewiring implicates PPP1R3A as a central regulator in pressure overload heart failure

    Get PDF
    Heart failure is a leading cause of mortality, yet our understanding of the genetic interactions underlying this disease remains incomplete. Here, we harvest 1352 healthy and failing human hearts directly from transplant center operating rooms, and obtain genome-wide genotyping and gene expression measurements for a subset of 313. We build failing and non-failing cardiac regulatory gene networks, revealing important regulators and cardiac expression quantitative trait loci (eQTLs). PPP1R3A emerges as a regulator whose network connectivity changes significantly between health and disease. RNA sequencing after PPP1R3A knockdown validates network-based predictions, and highlights metabolic pathway regulation associated with increased cardiomyocyte size and perturbed respiratory metabolism. Mice lacking PPP1R3A are protected against pressure-overload heart failure. We present a global gene interaction map of the human heart failure transition, identify previously unreported cardiac eQTLs, and demonstrate the discovery potential of disease-specific networks through the description of PPP1R3A as a central regulator in heart failure

    Many Specialists for Suppressing Cortical Excitation

    Get PDF
    Cortical computations are critically dependent on GABA-releasing neurons for dynamically balancing excitation with inhibition that is proportional to the overall level of activity. Although it is widely accepted that there are multiple types of interneurons, defining their identities based on qualitative descriptions of morphological, molecular and physiological features has failed to produce a universally accepted ‘parts list’, which is needed to understand the roles that interneurons play in cortical processing. A list of features has been published by the Petilla Interneurons Nomenclature Group, which represents an important step toward an unbiased classification of interneurons. To this end some essential features have recently been studied quantitatively and their association was examined using multidimensional cluster analyses. These studies revealed at least 3 distinct electrophysiological, 6 morphological and 15 molecular phenotypes. This is a conservative estimate of the number of interneuron types, which almost certainly will be revised as more quantitative studies will be performed and similarities will be defined objectively. It is clear that interneurons are organized with physiological attributes representing the most general, molecular characteristics the most detailed and morphological features occupying the middle ground. By themselves, none of these features are sufficient to define classes of interneurons. The challenge will be to determine which features belong together and how cell type-specific feature combinations are genetically specified

    Statistical approaches to harness high throughput sequencing data in diverse biological systems

    Get PDF
    The development of novel statistical approaches to questions specific to biological systems of interest is becoming more valuable as we tackle increasingly complex problems. This thesis explores three distinct biological systems in which high throughput sequencing data is utilised, varying in research area, organism, number of sequencing platforms and datasets integrated, and structure such as matched samples; showcasing the variety of study designs and thus the need for tailored statistical approaches. First, we characterise allelic imbalance from RNA-Seq data including stringent filtering criteria and a count based likelihood ratio test. This work identified genes of particular importance in livestock genomics such as those related to energy use. Second, we outline a novel methodology to identify highly expressed genes and cells for single cell RNA-Seq data. We derive a gamma-normal mixture model to identify lowly and highly expressed components, and use this to identify novel markers for olfactory sensory neuron (OSN) maturity across publicly available mouse neuron datasets. In addition we estimate single cell networks and find that mature OSN single cell networks are more centralised than immature OSN single cell networks. Third, we develop two novel frameworks for relating information from Whole Exome DNA-Seq and RNA-Seq data when i) samples are matched and when ii) samples are not necessary matched between platforms. In the latter case, we relate functional somatic mutation driver gene scores to transcriptional network correlation disturbance using a permutation testing framework, identifying potential candidate genes for targeted therapies. In the former case, we estimate directed mutation-expression networks for each cancer using linear models, providing a useful exploratory tool for identifying novel relationships among genes. This thesis demonstrates the importance of tailored statistical approaches to further understanding across many biological systems

    Computational modeling of drug response with applications to neuroscience

    Get PDF
    The development of novel high-throughput technologies has opened up the opportunity to deeply characterize patient tissues at various molecular levels and has given rise to a paradigm shift in medicine towards personalized therapies. Computational analysis plays a pivotal role in integrating the various genome data and understanding the cellular response to a drug. Based on that data, molecular models can be constructed that incorporate the known downstream effects of drug-targeted receptor molecules and that predict optimal therapy decisions. In this article, we describe the different steps in the conceptual framework of computational modeling. We review resources that hold information on molecular pathways that build the basis for constructing the model interaction maps, highlight network analysis concepts that have been helpful in identifying predictive disease patterns, and introduce the basic concepts of kinetic modeling. Finally, we illustrate this framework with selected studies related to the modeling of important target pathways affected by drugs

    Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets

    Get PDF
    Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes

    Mining metabolic pathways through gene expression

    Get PDF
    Motivation: An observed metabolic response is the result of the coordinated activation and interaction between multiple genetic pathways. However, the complex structure of metabolism has meant that a compete understanding of which pathways are required to produce an observed metabolic response is not fully understood. In this article, we propose an approach that can identify the genetic pathways which dictate the response of metabolic network to specific experimental conditions

    Identifying prognostic gene-signatures using a network-based approach

    Get PDF
    The main objective of this study is to develop a novel network-based methodology to identify prognostic signatures of genes that can predict recurrence in cancer. Feature selection algorithms were used widely for the identification of gene signatures in genome-wide association studies. But most of them do not discover the causal relationships between the features and need to compromise between accuracy and complexity. The network-based techniques take the molecular interactions between pairs of genes in to account and are thus a more efficient means of finding gene signatures, and they are also better in terms of its classification accuracy without compromising over complexity. Nevertheless, the network-based techniques currently being used have a few limitations each. Correlation-based coexpression networks do not provide predictive structure or causal relations among the genes. Bayesian networks cannot model feedback loops. Boolean networks can model small scale molecular networks, but not at the genome-scale. Thus the prediction logic induced implication networks are chosen to generate genome-wide coexpression networks, as they integrate formal logic and statistics and also overcome the limitations of other network-based techniques.;The first part of the study includes building of an implication network and identification of a set of genes that could form a prognostic signature. The data used consisted of 442 samples taken from 4 different sources. The data was split into training set UM/HLM (n=256) and two testing sets DFCI (n=82) and MSK (n=104). The training set was used for the generation of the implication network and eventually the identification of the prognostic signature. The test sets were used for validating the obtained signature. The implication networks were built by using the gene expression data associated with two disease states (metastasis or non-metastasis), defined by the period and status of post-operative survival. The gene interactions that differentiated the two disease states, the differential components, were identified. The major cancer hallmarks (E2F, EGF, EGFR, KRAS, MET, RB1, and TP53) were considered, and the genes that interacted with all the major hallmarks were identified from the differential components to form a 31-gene prognostic signature. A software package was created in R to automate this process which has C-code embedded into it. Next, the signature was fitted into a COX proportional hazard model and the nearest point to the perfect classification in the ROC curve was identified as the best scheme for patient stratification on the training set (log-rank p-value=1.97e-08), and two test sets DFCI (log-rank p-value=2.13e-05) and MSK (log-rank p-value=1.24e-04) in Kaplan-Meier analyses.;Prognostic validation was carried out on the test sets using methods such as Concordance Probability Estimate (CPE) and Gene Set Enrichment Analysis (GSEA). The accuracy of this signature was evaluated with CPE, which achieves 0.71 on the test set DFCI (log-rank p-value=5.3e-08) and 0.70 on test set MSK (log-rank p-value=2.1e-07). The hazard ratio of this 31-gene prognostic signature is 2.68 (95% CI: [1.88, 3.82]) on the DFCI dataset and 3.31 (95% CI: [2.11, 5.2]) on the MSK set. These results demonstrate that our 31-gene signature was significantly more accurate than previously published signatures on the same datasets. The false discovery rate (FDR) of this 31-gene signature is 0.21 as computed with GSEA, which showed that our 31 gene signature was comparable to other lung cancer prognostic signatures on the same datasets.;Topological validation was performed on the test sets for the identified signature to validate the computationally derived molecular interactions. The interactions from implication networks were compared with those from Bayesian networks implemented in Tetrad IV. Various curated databases and bioinformatics tools were used in the topological evaluation, including PRODISTIN, KEGG, PubMed, NCI-Nature pathways, MATISSE, STRING 8, Ingenuity Pathway Analysis, and Pathway Studio 6. The results showed that the implication networks generated all the curated interactions from various tools and databases, whereas Bayesian networks contained only a few of them. It can thus be concluded that implication networks are capable of generating many more gene or protein interactions when compared to the currently used network techniques such as Bayesian networks

    Communication between levels of transcriptional control improves robustness and adaptivity

    Get PDF
    Regulation of eukaryotic gene expression depends on groups of related proteins acting at the levels of chromatin organization, transcriptional initiation, RNA processing, and nuclear transport. However, a unified understanding of how these different levels of transcriptional control interact has been lacking. Here, we combine genome-wide protein–DNA binding data from multiple sources to infer the connections between functional groups of regulators in Saccharomyces cerevisiae. Our resulting transcriptional network uncovers novel biological relationships; supporting experiments confirm new associations between actively transcribed genes and Sir2 and Esc1, two proteins normally linked to silencing chromatin. Analysis of the regulatory network also reveals an elegant architecture for transcriptional control. Using communication theory, we show that most protein regulators prefer to form modules within their functional class, whereas essential proteins maintain the sparse connections between different classes. Moreover, we provide evidence that communication between different regulatory groups improves the robustness and adaptivity of the cell
    corecore