3,675 research outputs found

    Computational Characterization of Genome-wide DNA-binding Pro les

    Get PDF
    The work and data that is presented in this thesis is part of a collaborative project that is funded by the Berlin Center for Regenerative Therapies. A number of people have contributed to this work and for clarity I will now mention the individual contributions. Stefan Mundlos, Peter N. Robinson and Jochen Hecht designed this project with the purpose of studying bone development using ChIP-seq in a chicken model. Jochen Hecht and Asita Stiege established the ChIP-seq protocol and together with Daniel Ibrahim, Hendrikje Hein, and Catrin Janetzky carried out the immunoprecipitations and sequencing. Peter Krawitz was responsible for the data processing that involved base calling and basic quality control. Daniel Ibrahim contributed to the analysis on the Hox proteins identifying the Q317K mutant to be related to Pitx1 and Obox family members. Sebastian Kohler and Sebastian Bauer carried out the computation of the Gene Ontology similarity data and random walk distances that I used for the target gene assignments in chapter 5. The results for the EMSA experiments that are shown in chapter three has been carried out by Asita Stiege. The work on target gene assignment that is presented in chapter 5 has been published in Nucleic Acids Research [1]. All the remaining methods, data and the experimental results will be partially be included in future publications by Ibrahim et al. and Hein et al.

    cis-Regulation in the Mammalian Rod Photoreceptor

    Get PDF
    Transcription factors regulate the expression level of target genes by binding to cis-regulatory elements (CREs) present in gene promoters. The goal of my thesis research is to define the sequence components of CREs that determine transcriptional output. In order to accomplish this goal, I developed a method to measure the regulatory activity of thousands of CREs in a single experiment. In this method I insert unique barcodes in the 3\u27UTR of a reporter gene and multiplex expression measurements with RNA sequencing. Using this technique in explanted retinas, I determined the impact of single nucleotide variants in a mammalian promoter by measuring expression controlled by all single nucleotide variants of the Rhodopsin proximal promoter. I found that nearly all (86%) sequence variants drive significantly different activity than the wild-type promoter and that the mechanism of most variants can be interpreted as altered transcription factor binding. In addition, we found that the largest changes in expression resulted from variants located in characterized transcription factor binding site sequences. Next, I explored how combinations of binding sites drive particular levels of gene expression by utilizing a synthetic biology approach. I generated synthetic CREs composed of various combinations of binding sites found in the Rhodopsin promoter and measured the expression driven by these sequences. In this study I found that synthetic CREs containing binding sites for transcriptional activators yielded diverse expression outputs, including both activation and repression of a minimal promoter. Together, these experiments demonstrate that interactions between binding sites and dual regulation of a single binding site can produce diverse gene expression patterns. I conclude that simple cis-regulatory elements can produce complex expression outputs due to interactions between transcriptional activators and detailed quantitative models will be necessary to predict expression from these sequences

    Computational mapping of regulatory domains of human genes

    Get PDF
    Das menschliche Genom enthält Millionen von regulatorischen Elementen - Enhancern -, die die Genexpression quantitativ regulieren. Trotz des enormen Fortschritts beim Verständnis, wie Enhancer die Genexpression steuern, fehlt es in diesem Bereich immer noch an einem systematischen, integrativen und zugänglichen Ansatz zur Entdeckung und Dokumentation von cis-regulatorischen Beziehungen im gesamten Genom. Wir haben eine neuartige Methode - reg2gene - entwickelt, die Genexpression~Enhancer-Aktivität modelliert und integriert. reg2gene besteht aus drei Hauptschritten: 1) Datenquantifizierung, 2) Datenmodellierung und Signifikanzbewertung und 3) Datenintegration, die in dem R-Paket reg2gene zusammengefasst sind. Als Ergebnis haben wir zwei Sätze von Enhancer-Gen-Assoziationen (EGAs) identifiziert: den flexiblen Satz von ~230K EGAs (flexibleC) und den stringenten Satz von ~60K EGAs (stringentC). Wir haben große Unterschiede zwischen den bisher veröffentlichten Berechnungsmodellen für Enhancer-Gene-Assoziationen festgestellt, vor allem in Bezug auf die Lage, die Anzahl und die Eigenschaften der definierten Enhancer-Regionen und EGAs. Wir führten ein detailliertes Benchmarking von sieben Sets von rechnerisch modellierten EGAs durch, zeigten jedoch, dass keiner der derzeit verfügbaren Benchmark-Datensätze als "goldener Standard" verwendet werden kann. Wir definierten einen zusätzlichen Benchmark-Datensatz mit positiven und negativen EGAs, mit dem wir zeigten, dass das stringentC-Modell den höchsten positiven Vorhersagewert (PPV) hatte. Wir haben das Potenzial von EGAs zur Identifizierung von Genzielen von nicht-kodierenden SNP-Gene-Assoziationen nachgewiesen. Schließlich führten wir eine funktionelle Analyse durch, um neue Genziele, Enhancer-Pleiotropie und Mechanismen der Enhancer-Aktivität zu ermitteln. Insgesamt bringt diese Arbeit unser Verständnis der durch Enhancer vermittelten Regulierung der Genexpression in Gesundheit und Krankheit voran.Human genome contains millions of regulatory elements - enhancers - that quantitatively regulate gene expression. Multiple experimental and computational approaches were developed to associate enhancers with their gene targets. Despite the tremendous progress in understanding how enhancers tune gene expression, the field still lacks an approach that is systematic, integrative and accessible for discovering and documenting cis-regulatory relationships across the genome. We developed a novel computational approach - reg2gene- that models and integrates gene expression ~ enhancer activity. reg2gene consists of three main steps: 1) data quantification, 2) data modelling and significance assessment, and 3) data integration gathered in the reg2gene R package. As a result we identified two sets of enhancer-gene associations (EGAs): the flexible set of ~230K EGAs (flexibleC), and the stringent set of ~60K EGAs (stringentC). We identified major differences across previously published computational models of enhancer-gene associations; mostly in the location, number and properties of defined enhancer regions and EGAs. We performed detailed benchmarking of seven sets of computationally modelled EGAs, but showed that none of the currently available benchmark datasets could be used as a “golden-standard” benchmark dataset. To account for that observation, we defined an additional benchmark set of positive and negative EGAs with which we showed that the stringentC model had the highest positive predictive value (PPV) across all analyzed computational models. We reviewed the influence of EGA sets on the functional analysis of risk SNPs and demonstrated the potential of EGAs to identify gene targets of non-coding SNP-gene associations. Lastly, we performed a functional analysis to detect novel gene targets, enhancer pleiotropy, and mechanisms of enhancer activity. Altogether, this work advances our understanding of enhancer-mediated gene expression regulation in health and disease.Ljudski genom sadrži milijune regulatornih elemenata - enhancera - koji kvantitativno reguliraju ekspresiju gena. Unatoč ogromnom napretku u razumijevanju načina na koji enhanceri reguliraju ekspresiju gena, području još uvijek nedostaje pristup koji je sustavan, integrativan i dostupan za otkrivanje i dokumentiranje cis-regulatornih odnosa u cijelom genomu. Razvili smo novu računalnu metodu - reg2gene - koja modelira i integrira aktivnost enhancera~ekspresije gena. reg2gene sastoji se od tri glavna koraka: 1) kvantifikacija podataka, 2) modeliranje podataka i procjena značaja, i 3) integracija podataka prikupljenih u reg2gene R paketu. Kao rezultat toga, identificirali smo dva skupa enhancer-gen interakcija (EGA): fleksibilni skup od ~ 230K EGA (flexibleC) i strogi skup od ~ 60K EGA (stringentC). Utvrdili smo velike razlike u prethodno objavljenim računalnim modelima enhancer-gen interakcija; uglavnom u lokaciji, broju i svojstvima definiranih enhancera i EGA. Izveli smo detaljno mjerenje performansi sedam skupova računalno modeliranih EGA-a, ali smo pokazali da se niti jedan od trenutno dostupnih skupova referentnih podataka ne može koristiti kao referentni skup podataka "zlatnI standard". Definirali smo dodatni referentni skup pozitivnih i negativnih EGA -a pomoću kojih smo pokazali da stringentC ima najveću pozitivnu prediktivnu vrijednost (PPV). Pokazali smo potencijal EGA-a za identifikaciju genskih meta nekodirajucih SNP-ova. Proveli smo funkcionalnu analizu kako bismo otkrili nove genske mete, pleiotropiju enhancera i mehanizme aktivnosti enhancera. Ovaj rad poboljšava naše razumijevanje regulacije ekspresije gena posredovane enhancerima

    The Reasonable Effectiveness of Randomness in Scalable and Integrative Gene Regulatory Network Inference and Beyond

    Get PDF
    Gene regulation is orchestrated by a vast number of molecules, including transcription factors and co-factors, chromatin regulators, as well as epigenetic mechanisms, and it has been shown that transcriptional misregulation, e.g., caused by mutations in regulatory sequences, is responsible for a plethora of diseases, including cancer, developmental or neurological disorders. As a consequence, decoding the architecture of gene regulatory networks has become one of the most important tasks in modern (computational) biology. However, to advance our understanding of the mechanisms involved in the transcriptional apparatus, we need scalable approaches that can deal with the increasing number of large-scale, high-resolution, biological datasets. In particular, such approaches need to be capable of efficiently integrating and exploiting the biological and technological heterogeneity of such datasets in order to best infer the underlying, highly dynamic regulatory networks, often in the absence of sufficient ground truth data for model training or testing. With respect to scalability, randomized approaches have proven to be a promising alternative to deterministic methods in computational biology. As an example, one of the top performing algorithms in a community challenge on gene regulatory network inference from transcriptomic data is based on a random forest regression model. In this concise survey, we aim to highlight how randomized methods may serve as a highly valuable tool, in particular, with increasing amounts of large-scale, biological experiments and datasets being collected. Given the complexity and interdisciplinary nature of the gene regulatory network inference problem, we hope our survey maybe helpful to both computational and biological scientists. It is our aim to provide a starting point for a dialogue about the concepts, benefits, and caveats of the toolbox of randomized methods, since unravelling the intricate web of highly dynamic, regulatory events will be one fundamental step in understanding the mechanisms of life and eventually developing efficient therapies to treat and cure diseases

    Evolutionary processes from the perspective of flowering time diversity.

    Get PDF
    Although it is well appreciated that genetic studies of flowering time regulation have led to fundamental advances in the fields of molecular and developmental biology, the ways in which genetic studies of flowering time diversity have enriched the field of evolutionary biology have received less attention despite often being equally profound. Because flowering time is a complex, environmentally responsive trait that has critical impacts on plant fitness, crop yield, and reproductive isolation, research into the genetic architecture and molecular basis of its evolution continues to yield novel insights into our understanding of domestication, adaptation, and speciation. For instance, recent studies of flowering time variation have reconstructed how, when, and where polygenic evolution of phenotypic plasticity proceeded from standing variation and de novo mutations; shown how antagonistic pleiotropy and temporally varying selection maintain polymorphisms in natural populations; and provided important case studies of how assortative mating can evolve and facilitate speciation with gene flow. In addition, functional studies have built detailed regulatory networks for this trait in diverse taxa, leading to new knowledge about how and why developmental pathways are rewired and elaborated through evolutionary time

    Unraveling the transcriptional Cis-regulatory code

    Get PDF
    It is nowadays accepted that eukaryotic complexity is not dictated by the number of protein-coding genes of the genome, but rather achieved through the combinatorics of gene expression programs. Distinct aspects of the expression pattern of a gene are mediated by discrete regulatory sequences, known as cis-regulatory elements. The work described in this thesis was aimed at developing computational and statistical methods to guide the search and characterization of novel cis-regulatory elements
    • …
    corecore