1,100 research outputs found

    COUGER-co-factors associated with uniquely-bound genomic regions

    Get PDF
    Most transcription factors (TFs) belong to protein families that share a common DNA binding domain and have very similar DNA binding preferences. However, many paralogous TFs (i.e. members of the same TF family) perform different regulatory functions and interact with different genomic regions in the cell. A potential mechanism for achieving this differential in vivo specificity is through interactions with protein co-factors. Computational tools for studying the genomic binding profiles of paralogous TFs and identifying their putative co-factors are currently lacking. Here, we present an interactive web implementation of COUGER, a classification-based framework for identifying protein co-factors that might provide specificity to paralogous TFs. COUGER takes as input two sets of genomic regions bound by paralogous TFs, and it identifies a small set of putative co-factors that best distinguish the two sets of sequences. To achieve this task, COUGER uses a classification approach, with features that reflect the DNA-binding specificities of the putative co-factors. The identified co-factors are presented in a user-friendly output page, together with information that allows the user to understand and to explore the contributions of individual co-factor features. COUGER can be run as a stand-alone tool or through a web interface: http://couger.oit.duke.edu

    Predicting In Vivo Transcription Factor Occupancy from In Vitro Binding

    Get PDF
    <p>The spatial pattern of transcription factor (TF) binding and the level of TF occupancy at individual sites across the genome determine how a TF regulates its targets. Consequently, predicting the location and level of TF binding genome-wide is of great importance and has received much attention recently. Protein-binding microarray (PBM) technology has become the golden standard for studying TF-DNA interactions in vitro, while Chromatin Immunoprecipitation followed by DNA Sequencing (ChIP-seq) is the standard method for inferring TF binding in vivo. However, direct interpretation of in vitro results in an in vivo context is challenging and to-date remains scarce. In this study, we focus on the E2F family of paralogous TFs, whose mode of binding to DNA has been controversial. Previous studies have shown that E2F factors bind to the TTTSSCGCG motif, where S can be a C or a G. Still, only a small fraction of in vivo targets are reported to contain this motif, hinting at indirect recruitment of the protein. We observed that genomic occupancy of E2F factors directly correlates with their in vitro binding affinities. By using data from universal PBM experiments, we show that E2F factors likely bind to DNA through direct sequence recognition and not through cofactor interaction. Furthermore, we developed a kinetic binding model using the PBM data to describe competition between different members of the E2F family and successfully distinguished between their unique targets. Overall, these results demonstrate how the straightforward and simple in vitro PBM experiments can be used for inferring the complex in vivo landscape of TF binding and elucidate the mechanism of E2F-DNA interaction.</p>Thesi

    Always read the introduction : integrating regulatory and coding sequence evolution in yeast

    Get PDF
    We analyze duplicate genes in a yeast, Saccharomyces cerevisiae with the aim of determining a genes history and to observe that gene in its genomic context. In Chapter 2 we show that the fate of a duplicate gene pair is in part determined by its genome location. Moreover, we show that for two classes of duplicate genes, resulting from either small-scale duplication or whole-genome duplication, this fate can often be assessed by measuring the patterns of asymmetry in the sequence divergence of the genes in question. In Chapter 3 we study duplicate genes in the context of their local environments by comparing the patterns of evolution in the coding sequences of duplicate genes for ribosomal proteins with their upstream non-coding sequences. We found that while the coding sequences show strong evidence of recent gene conversion events, similar patterns are not seen in the non-coding regulatory elements. These duplicated ribosomal proteins are not functionally redundant despite their very high degree of protein sequence identity. This analysis confirms that the duplicated proteins have diverged considerably in expression despite their similar protein sequences. In Chapter 4 we analyze the structure of the transcriptional regulation network and characterize the molecular evolution of both its transcriptional regulators and their regulated genes. We found that both subfunctionalization and neofunctionalization of transcription factor binding play a role in divergence

    The limits of subfunctionalization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The duplication-degeneration-complementation (DDC) model has been proposed as an explanation for the unexpectedly high retention of duplicate genes. The hypothesis proposes that, following gene duplication, the two gene copies degenerate to perform complementary functions that jointly match that of the single ancestral gene, a process also known as subfunctionalization. We distinguish between subfunctionalization at the regulatory level and at the product level (e.g within temporal or spatial expression domains).</p> <p>Results</p> <p>In contrast to what is expected under the DDC model, we use <it>in silico </it>modeling to show that regulatory subfunctionalization is expected to peak and then decrease significantly. At the same time, neofunctionalization (recruitment of novel interactions) increases monotonically, eventually affecting the regulatory elements of the majority of genes. Furthermore, since this process occurs under conditions of stabilizing selection, there is no need to invoke positive selection. At the product level, the frequency of subfunctionalization is no higher than would be expected by chance, a finding that was corroborated using yeast microarray time-course data. We also find that product subfunctionalization is not necessarily caused by regulatory subfunctionalization.</p> <p>Conclusion</p> <p>Our results suggest a more complex picture of post-duplication evolution in which subfunctionalization plays only a partial role in conjunction with redundancy and neofunctionalization. We argue that this behavior is a consequence of the high evolutionary plasticity in gene networks.</p

    Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human

    Get PDF
    Accurate predictions of orthology and paralogy relationships are necessary to infer human molecular function from experiments in model organisms. Previous genome-scale approaches to predicting these relationships have been limited by their use of protein similarity and their failure to take into account multiple splicing events and gene prediction errors. We have developed PhyOP, a new phylogenetic orthology prediction pipeline based on synonymous rate estimates, which accurately predicts orthology and paralogy relationships for transcripts, genes, exons, or genomic segments between closely related genomes. We were able to identify orthologue relationships to human genes for 93% of all dog genes from Ensembl. Among 1:1 orthologues, the alignments covered a median of 97.4% of protein sequences, and 92% of orthologues shared essentially identical gene structures. PhyOP accurately recapitulated genomic maps of conserved synteny. Benchmarking against predictions from Ensembl and Inparanoid showed that PhyOP is more accurate, especially in its predictions of paralogy. Nearly half (46%) of PhyOP paralogy predictions are unique. Using PhyOP to investigate orthologues and paralogues in the human and dog genomes, we found that the human assembly contains 3-fold more gene duplications than the dog. Species-specific duplicate genes, or ā€œin-paralogues,ā€ are generally shorter and have fewer exons than 1:1 orthologues, which is consistent with selective constraints and mutation biases based on the sizes of duplicated genes. In-paralogues have experienced elevated amino acid and synonymous nucleotide substitution rates. Duplicates possess similar biological functions for either the dog or human lineages. Having accounted for 2,954 likely pseudogenes and gene fragments, and after separating 346 erroneously merged genes, we estimated that the human genome encodes a minimum of 19,700 protein-coding genes, similar to the gene count of nematode worms. PhyOP is a fast and robust approach to orthology prediction that will be applicable to whole genomes from multiple closely related species. PhyOP will be particularly useful in predicting orthology for mammalian genomes that have been incompletely sequenced, and for large families of rapidly duplicating genes

    Efficient algorithms for gene cluster detection in prokaryotic genomes

    Get PDF
    Schmidt T. Efficient algorithms for gene cluster detection in prokaryotic genomes. Bielefeld (Germany): Bielefeld University; 2005.The research in genomics science rapidly emerged in the last few years, and the availability of completely sequenced genomes continuously increases due to the use of semi-automatic sequencing machines. Also these sequences, mostly prokaryotic ones, are well annotated, which means that the positions of their genes and parts of their regulatory or metabolic pathways are known. A new task in the field of bioinformatics now is to gain gene or protein information from the comparison of genomes on a higher level. In the approach of "comparative genomics" researchers in bioinformatics are attempting to locate groups or clusters of orthologous genes that may have the same function in multiple genomes. These researches are often anchored on the simple, but biologically verified fact, that functionally related proteins are usually coded by genes placed in a region of close genomic neighborhood, in different species. From an algorithmic and combinatorial point of view, the first descriptions of the concept of "closely placed genes" were only fragmentary, and sometimes confusing. The given algorithms often lack the necessary grounds to prove their correctness, or assess their complexity. Within the first formal models of a conserved genomic neighborhood, genomes are often represented as permutations of their genes, and common intervals, i.e. intervals containing the same set of genes, are interpreted as gene clusters. But here the major disadvantage of representing genomes as permutations is the fact that paralogous copies of the same gene inside one genome can not be modelled. Since especially large genomes contain numerous paralogous genes, this model is insufficient to be used on real genomic data. In this work, we consider a modified model of gene clusters that allows paralogs, simply by representing genomes as sequences rather than permutations of genes. We define common intervals based on this model, and we present a simple algorithm that finds all common intervals of two sequences in [Theta](n2) time using [Theta](n2) space. Another, more complicated algorithm runs in [Omikron](n2) time and uses only linear space. We also show how to extend these algorithms to more than two genomes and present the implementation of the algorithms as well as the visualization of the located clusters in the tool Gecko. Since the creation of the string representation of a set of genomes is a non-trivial task, we also present the data preparation tool GhostFam that groups all genes from the given set of genomes to their families of homologs. In the evaluation on a set of 20 bacterial genomes, we show that with the presented approach it is possible to correctly locate gene clusters that are known from the literature, and to successfully predict new groups of functionally related genes

    Protein interactions across and between eukaryotic kingdoms: networks, inference strategies, integration of functional data and evolutionary dynamics

    Full text link
    Thesis (Ph.D.)--Boston UniversityHow cellular elements coordinate their function is a fundamental question in biology. A crucial step towards understanding cellular systems is the mapping of physical interactions between protein, DNA, RNA and other macromolecules or metabolites. Genome-scale technologies have yielded protein-protein interaction networks for several eukaryotic species and have provided insight into biological processes and evolution, but many of the currently available networks are biased. Towards a true human protein-protein interaction network, we examined literature-based aggregations of lowthroughput experiments, high-throughput experimental networks validated using different strategies, and predicted interaction networks to infer how the underlying interactome may differ from current maps. Using systematically mapped interactome networks, which appear to be the least biased, we explored the functional organization of Arabidopsis thaliana and characterize the asymmetric divergence of duplicated paralogous proteins through their interaction profiles. To further dissect the relationship between interactions and function enforced by evolution, we investigated a first-of-its-kind systematic crossspecies human-yeast hybrid interactome network. Although the cross-species network is topologically similar to conventional intra-species networks, we found signatures of dynamic changes in interaction propensities due to countervailing evolutionary forces. Collectively, these analyses of human, plant and yeast interactome networks bridge separate experiments to characterize bias, function and evolution across eukaryotic kingdoms

    An Assessment of Combinatorial Transcription Factor Activity at p53 Enhancer Elements

    Get PDF
    Certain non-coding DNA sequences in the eukaryotic genome regulate gene expression. These non-coding regulatory regions, including promoters and enhancers, are controlled by the binding of multiple transcription factors which act together to regulate gene transcription. The number of potential transcription factor combinations regulating any gene presents a massive experimental challenge. One well-known transcription factor, p53, activates multiple transcription pathways involved in tumor suppression, primarily through engagement with enhancers. p53 is one member of a paralogous transcription factor family, which includes the factor p63. Whereas p53 is involved in tumor suppression, p63 is a transcription factor responsible for maintaining epithelial cell populations through its ability to bind to and regulate enhancers. p63 and p53 are often bound to the same enhancers in the genome, suggesting a more complex regulation than predicted by their canonical functions. We therefore aimed to better understand how genomic binding sequences and other factors regulate p53 and p63 activity at enhancers. Luciferase reporter gene assays were utilized to measure the transcriptional output of various p63 and p53 enhancers after genetically altering flanking DNA sequence motifs. We found that changing these flanking regions revealed core regulatory sequences that drive p53 and p63 transcriptional activity. We also determined that p63-bound enhancers, but not those bound by p53, had context-dependent activity. Depending on the cell type, these enhancers are active or inactive, with basal expression of p63 determining their activity. Our data provide new insight into the regulation of p53 family enhancers, and further work will lead to a better understanding of transcription factor activity and function at enhancers
    • ā€¦
    corecore