79 research outputs found

    UNCLES: Method for the identification of genes differentially consistently co-expressed in a specific subset of datasets

    Get PDF
    Background: Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Results: Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. Conclusions: The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.The National Institute for Health Research (NIHR) under its Programme Grants for Applied Research Programme (Grant Reference Number RP-PG-0310-1004)

    An Integrated Approach to Identifying Cis-Regulatory Modules in the Human Genome

    Get PDF
    In eukaryotic genomes, it is challenging to accurately determine target sites of transcription factors (TFs) by only using sequence information. Previous efforts were made to tackle this task by considering the fact that TF binding sites tend to be more conserved than other functional sites and the binding sites of several TFs are often clustered. Recently, ChIP-chip and ChIP-sequencing experiments have been accumulated to identify TF binding sites as well as survey the chromatin modification patterns at the regulatory elements such as promoters and enhancers. We propose here a hidden Markov model (HMM) to incorporate sequence motif information, TF-DNA interaction data and chromatin modification patterns to precisely identify cis-regulatory modules (CRMs). We conducted ChIP-chip experiments on four TFs, CREB, E2F1, MAX, and YY1 in 1% of the human genome. We then trained a hidden Markov model (HMM) to identify the labels of the CRMs by incorporating the sequence motifs recognized by these TFs and the ChIP-chip ratio. Chromatin modification data was used to predict the functional sites and to further remove false positives. Cross-validation showed that our integrated HMM had a performance superior to other existing methods on predicting CRMs. Incorporating histone signature information successfully penalized false prediction and improved the whole performance. The dataset we used and the software are available at http://nash.ucsd.edu/CIS/

    A Buoyancy-Based Screen of Drosophila Larvae for Fat-Storage Mutants Reveals a Role for Sir2 in Coupling Fat Storage to Nutrient Availability

    Get PDF
    Obesity has a strong genetic component, but few of the genes that predispose to obesity are known. Genetic screens in invertebrates have the potential to identify genes and pathways that regulate the levels of stored fat, many of which are likely to be conserved in humans. To facilitate such screens, we have developed a simple buoyancy-based screening method for identifying mutant Drosophila larvae with increased levels of stored fat. Using this approach, we have identified 66 genes that when mutated increase organismal fat levels. Among these was a sirtuin family member, Sir2. Sirtuins regulate the storage and metabolism of carbohydrates and lipids by deacetylating key regulatory proteins. However, since mammalian sirtuins function in many tissues in different ways, it has been difficult to define their role in energy homeostasis accurately under normal feeding conditions. We show that knockdown of Sir2 in the larval fat body results in increased fat levels. Moreover, using genetic mosaics, we demonstrate that Sir2 restricts fat accumulation in individual cells of the fat body in a cell-autonomous manner. Consistent with this function, changes in the expression of metabolic enzymes in Sir2 mutants point to a shift away from catabolism. Surprisingly, although Sir2 is typically upregulated under conditions of starvation, Sir2 mutant larvae survive better than wild type under conditions of amino-acid starvation as long as sugars are provided. Our findings point to a Sir2-mediated pathway that activates a catabolic response to amino-acid starvation irrespective of the sugar content of the diet

    Analysis of Combinatorial Regulation: Scaling of Partnerships between Regulators with the Number of Governed Targets

    Get PDF
    Through combinatorial regulation, regulators partner with each other to control common targets and this allows a small number of regulators to govern many targets. One interesting question is that given this combinatorial regulation, how does the number of regulators scale with the number of targets? Here, we address this question by building and analyzing co-regulation (co-transcription and co-phosphorylation) networks that describe partnerships between regulators controlling common genes. We carry out analyses across five diverse species: Escherichia coli to human. These reveal many properties of partnership networks, such as the absence of a classical power-law degree distribution despite the existence of nodes with many partners. We also find that the number of co-regulatory partnerships follows an exponential saturation curve in relation to the number of targets. (For E. coli and Bacillus subtilis, only the beginning linear part of this curve is evident due to arrangement of genes into operons.) To gain intuition into the saturation process, we relate the biological regulation to more commonplace social contexts where a small number of individuals can form an intricate web of connections on the internet. Indeed, we find that the size of partnership networks saturates even as the complexity of their output increases. We also present a variety of models to account for the saturation phenomenon. In particular, we develop a simple analytical model to show how new partnerships are acquired with an increasing number of target genes; with certain assumptions, it reproduces the observed saturation. Then, we build a more general simulation of network growth and find agreement with a wide range of real networks. Finally, we perform various down-sampling calculations on the observed data to illustrate the robustness of our conclusions

    Structural constraints revealed in consistent nucleosome positions in the genome of S. cerevisiae

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent advances in the field of high-throughput genomics have rendered possible the performance of genome-scale studies to define the nucleosomal landscapes of eukaryote genomes. Such analyses are aimed towards providing a better understanding of the process of nucleosome positioning, for which several models have been suggested. Nevertheless, questions regarding the sequence constraints of nucleosomal DNA and how they may have been shaped through evolution remain open. In this paper, we analyze in detail different experimental nucleosome datasets with the aim of providing a hypothesis for the emergence of nucleosome-forming sequences.</p> <p>Results</p> <p>We compared the complete sets of nucleosome positions for the budding yeast (<it>Saccharomyces cerevisiae</it>) as defined in the output of two independent experiments with the use of two different experimental techniques. We found that < 10% of the experimentally defined nucleosome positions were consistently positioned in both datasets. This subset of well-positioned nucleosomes, when compared with the bulk, was shown to have particular properties at both sequence and structural levels. Consistently positioned nucleosomes were also shown to occur preferentially in pairs of dinucleosomes, and to be surprisingly less conserved compared with their adjacent nucleosome-free linkers.</p> <p>Conclusion</p> <p>Our findings may be combined into a hypothesis for the emergence of a weak nucleosome-positioning code. According to this hypothesis, consistent nucleosomes may be partly guided by nearby nucleosome-free regions through statistical positioning. Once established, a set of well-positioned consistent nucleosomes may impose secondary constraints that further shape the structure of the underlying DNA. We were able to capture these constraints through the application of a recently introduced structural property that is related to the symmetry of DNA curvature. Furthermore, we found that both consistently positioned nucleosomes and their adjacent nucleosome-free regions show an increased tendency towards conservation of this structural feature.</p

    Inferring the role of transcription factors in regulatory networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Expression profiles obtained from multiple perturbation experiments are increasingly used to reconstruct transcriptional regulatory networks, from well studied, simple organisms up to higher eukaryotes. Admittedly, a key ingredient in developing a reconstruction method is its ability to integrate heterogeneous sources of information, as well as to comply with practical observability issues: measurements can be scarce or noisy. In this work, we show how to combine a network of genetic regulations with a set of expression profiles, in order to infer the functional effect of the regulations, as inducer or repressor. Our approach is based on a consistency rule between a network and the signs of variation given by expression arrays.</p> <p>Results</p> <p>We evaluate our approach in several settings of increasing complexity. First, we generate artificial expression data on a transcriptional network of <it>E. coli </it>extracted from the literature (1529 nodes and 3802 edges), and we estimate that 30% of the regulations can be annotated with about 30 profiles. We additionally prove that at most 40.8% of the network can be inferred using our approach. Second, we use this network in order to validate the predictions obtained with a compendium of real expression profiles. We describe a filtering algorithm that generates particularly reliable predictions. Finally, we apply our inference approach to <it>S. cerevisiae </it>transcriptional network (2419 nodes and 4344 interactions), by combining ChIP-chip data and 15 expression profiles. We are able to detect and isolate inconsistencies between the expression profiles and a significant portion of the model (15% of all the interactions). In addition, we report predictions for 14.5% of all interactions.</p> <p>Conclusion</p> <p>Our approach does not require accurate expression levels nor times series. Nevertheless, we show on both data, real and artificial, that a relatively small number of perturbation experiments are enough to determine a significant portion of regulatory effects. This is a key practical asset compared to statistical methods for network reconstruction. We demonstrate that our approach is able to provide accurate predictions, even when the network is incomplete and the data is noisy.</p

    Glycosaminoglycans and Sialylated Glycans Sequentially Facilitate Merkel Cell Polyomavirus Infectious Entry

    Get PDF
    Merkel cell polyomavirus (MCV or MCPyV) appears to be a causal factor in the development of Merkel cell carcinoma, a rare but highly lethal form of skin cancer. Although recent reports indicate that MCV virions are commonly shed from apparently healthy human skin, the precise cellular tropism of the virus in healthy subjects remains unclear. To begin to explore this question, we set out to identify the cellular receptors or co-receptors required for the infectious entry of MCV. Although several previously studied polyomavirus species have been shown to bind to cell surface sialic acid residues associated with glycolipids or glycoproteins, we found that sialylated glycans are not required for initial attachment of MCV virions to cultured human cell lines. Instead, glycosaminoglycans (GAGs), such as heparan sulfate (HS) and chondroitin sulfate (CS), serve as initial attachment receptors during the MCV infectious entry process. Using cell lines deficient in GAG biosynthesis, we found that N-sulfated and/or 6-O-sulfated forms of HS mediate infectious entry of MCV reporter vectors, while CS appears to be dispensable. Intriguingly, although cell lines deficient in sialylated glycans readily bind MCV capsids, the cells are highly resistant to MCV reporter vector-mediated gene transduction. This suggests that sialylated glycans play a post-attachment role in the infectious entry process. Results observed using MCV reporter vectors were confirmed using a novel system for infectious propagation of native MCV virions. Taken together, the findings suggest a model in which MCV infectious entry occurs via initial cell binding mediated primarily by HS, followed by secondary interactions with a sialylated entry co-factor. The study should facilitate the development of inhibitors of MCV infection and help shed light on the infectious entry pathways and cellular tropism of the virus

    Uncovering a Macrophage Transcriptional Program by Integrating Evidence from Motif Scanning and Expression Dynamics

    Get PDF
    Macrophages are versatile immune cells that can detect a variety of pathogen-associated molecular patterns through their Toll-like receptors (TLRs). In response to microbial challenge, the TLR-stimulated macrophage undergoes an activation program controlled by a dynamically inducible transcriptional regulatory network. Mapping a complex mammalian transcriptional network poses significant challenges and requires the integration of multiple experimental data types. In this work, we inferred a transcriptional network underlying TLR-stimulated murine macrophage activation. Microarray-based expression profiling and transcription factor binding site motif scanning were used to infer a network of associations between transcription factor genes and clusters of co-expressed target genes. The time-lagged correlation was used to analyze temporal expression data in order to identify potential causal influences in the network. A novel statistical test was developed to assess the significance of the time-lagged correlation. Several associations in the resulting inferred network were validated using targeted ChIP-on-chip experiments. The network incorporates known regulators and gives insight into the transcriptional control of macrophage activation. Our analysis identified a novel regulator (TGIF1) that may have a role in macrophage activation

    Linking Proteomic and Transcriptional Data through the Interactome and Epigenome Reveals a Map of Oncogene-induced Signaling

    Get PDF
    Cellular signal transduction generally involves cascades of post-translational protein modifications that rapidly catalyze changes in protein-DNA interactions and gene expression. High-throughput measurements are improving our ability to study each of these stages individually, but do not capture the connections between them. Here we present an approach for building a network of physical links among these data that can be used to prioritize targets for pharmacological intervention. Our method recovers the critical missing links between proteomic and transcriptional data by relating changes in chromatin accessibility to changes in expression and then uses these links to connect proteomic and transcriptome data. We applied our approach to integrate epigenomic, phosphoproteomic and transcriptome changes induced by the variant III mutation of the epidermal growth factor receptor (EGFRvIII) in a cell line model of glioblastoma multiforme (GBM). To test the relevance of the network, we used small molecules to target highly connected nodes implicated by the network model that were not detected by the experimental data in isolation and we found that a large fraction of these agents alter cell viability. Among these are two compounds, ICG-001, targeting CREB binding protein (CREBBP), and PKF118–310, targeting β-catenin (CTNNB1), which have not been tested previously for effectiveness against GBM. At the level of transcriptional regulation, we used chromatin immunoprecipitation sequencing (ChIP-Seq) to experimentally determine the genome-wide binding locations of p300, a transcriptional co-regulator highly connected in the network. Analysis of p300 target genes suggested its role in tumorigenesis. We propose that this general method, in which experimental measurements are used as constraints for building regulatory networks from the interactome while taking into account noise and missing data, should be applicable to a wide range of high-throughput datasets.National Science Foundation (U.S.) (DB1-0821391)National Institutes of Health (U.S.) (Grant U54-CA112967)National Institutes of Health (U.S.) (Grant R01-GM089903)National Institutes of Health (U.S.) (P30-ES002109
    corecore