3,675 research outputs found
Computational Characterization of Genome-wide DNA-binding Pro les
The work and data that is presented in this thesis is part of a collaborative project that is funded by the Berlin Center for Regenerative Therapies. A number of people have contributed to this work and for clarity I will now mention the individual contributions. Stefan Mundlos, Peter N. Robinson and Jochen Hecht designed this project with the purpose of studying bone development using ChIP-seq in a chicken model. Jochen Hecht and Asita Stiege established the ChIP-seq protocol and together with Daniel Ibrahim, Hendrikje Hein, and Catrin Janetzky carried out the immunoprecipitations and sequencing. Peter Krawitz was responsible for the data processing that involved base calling and basic quality control. Daniel Ibrahim contributed to the analysis on the Hox proteins identifying the Q317K mutant to be related to Pitx1 and Obox family members. Sebastian Kohler and Sebastian Bauer carried out the computation of the Gene Ontology similarity data and random walk distances that I used for the target gene assignments in chapter 5. The results for the EMSA experiments that are shown in chapter three has been carried out by Asita Stiege. The work on target gene assignment that is presented in chapter 5 has been published in Nucleic Acids Research [1]. All the remaining methods, data and the experimental results will be partially be included in future publications by Ibrahim et al. and Hein et al.
cis-Regulation in the Mammalian Rod Photoreceptor
Transcription factors regulate the expression level of target genes by binding to cis-regulatory elements (CREs) present in gene promoters. The goal of my thesis research is to define the sequence components of CREs that determine transcriptional output. In order to accomplish this goal, I developed a method to measure the regulatory activity of thousands of CREs in a single experiment. In this method I insert unique barcodes in the 3\u27UTR of a reporter gene and multiplex expression measurements with RNA sequencing. Using this technique in explanted retinas, I determined the impact of single nucleotide variants in a mammalian promoter by measuring expression controlled by all single nucleotide variants of the Rhodopsin proximal promoter. I found that nearly all (86%) sequence variants drive significantly different activity than the wild-type promoter and that the mechanism of most variants can be interpreted as altered transcription factor binding. In addition, we found that the largest changes in expression resulted from variants located in characterized transcription factor binding site sequences. Next, I explored how combinations of binding sites drive particular levels of gene expression by utilizing a synthetic biology approach. I generated synthetic CREs composed of various combinations of binding sites found in the Rhodopsin promoter and measured the expression driven by these sequences. In this study I found that synthetic CREs containing binding sites for transcriptional activators yielded diverse expression outputs, including both activation and repression of a minimal promoter. Together, these experiments demonstrate that interactions between binding sites and dual regulation of a single binding site can produce diverse gene expression patterns. I conclude that simple cis-regulatory elements can produce complex expression outputs due to interactions between transcriptional activators and detailed quantitative models will be necessary to predict expression from these sequences
Recommended from our members
Deciphering Regulatory Networks in the Mouse Genome
Regardless of all the major achievements in the field of genomics and in depth studies of the protein-coding genes, our knowledge about non-coding regions and their contribution in diseases remains incomplete. Large scale projects such as the ENCODE have produced a wealth of sequencing data which can be utilised to study epigenetic features associated with gene regulation. These studies have comprehensively identified regulatory elements such as enhancers in the human genome, but numerous questions still remain on their effect on gene function and disease causation.
The aim of this thesis is to identify enhancer regulatory networks in the mouse genome and investigate their effect on mouse models of human diseases. In order to study enhancer regulation, I have taken two approaches. First, I have produced a catalogue of well-defined multiple enhancer types in a diverse range of mouse tissues and cell-types. By systematically comparing different enhancer types, I found that super- and typical-enhancers have different effect on gene expression, but both are preferentially associated with relevant tissue-type phenotypes. Also genes associated with super- and typical-enhancers exhibit no difference in phenotype effect size or pleiotropy. Second, by utilising publicly available regulatory annotations, my enhancer catalogue and omics data, I have investigated regulatory mechanisms associated with metabolic and circadian mouse models. Here I identified novel regulatory networks or enhancers or transcription factor binding sites pertaining to the mutant mice.
In conclusion, my research has shown the usefulness of integrating enhancer annotations with an array of molecular data and has for the first time shown how different enhancer architectures influence gene function in the mouse genome. This study provides a valuable dataset to further characterise the mechanisms of gene regulation by enhancers in the mouse genome
Computational mapping of regulatory domains of human genes
Das menschliche Genom enthält Millionen von regulatorischen Elementen - Enhancern -, die die Genexpression quantitativ regulieren. Trotz des enormen Fortschritts beim Verständnis, wie Enhancer die Genexpression steuern, fehlt es in diesem Bereich immer noch an einem systematischen, integrativen und zugänglichen Ansatz zur Entdeckung und Dokumentation von cis-regulatorischen Beziehungen im gesamten Genom.
Wir haben eine neuartige Methode - reg2gene - entwickelt, die Genexpression~Enhancer-Aktivität modelliert und integriert. reg2gene besteht aus drei Hauptschritten: 1) Datenquantifizierung, 2) Datenmodellierung und Signifikanzbewertung und 3) Datenintegration, die in dem R-Paket reg2gene zusammengefasst sind. Als Ergebnis haben wir zwei Sätze von Enhancer-Gen-Assoziationen (EGAs) identifiziert: den flexiblen Satz von ~230K EGAs (flexibleC) und den stringenten Satz von ~60K EGAs (stringentC). Wir haben große Unterschiede zwischen den bisher veröffentlichten Berechnungsmodellen für Enhancer-Gene-Assoziationen festgestellt, vor allem in Bezug auf die Lage, die Anzahl und die Eigenschaften der definierten Enhancer-Regionen und EGAs.
Wir führten ein detailliertes Benchmarking von sieben Sets von rechnerisch modellierten EGAs durch, zeigten jedoch, dass keiner der derzeit verfügbaren Benchmark-Datensätze als "goldener Standard" verwendet werden kann. Wir definierten einen zusätzlichen Benchmark-Datensatz mit positiven und negativen EGAs, mit dem wir zeigten, dass das stringentC-Modell den höchsten positiven Vorhersagewert (PPV) hatte. Wir haben das Potenzial von EGAs zur Identifizierung von Genzielen von nicht-kodierenden SNP-Gene-Assoziationen nachgewiesen. Schließlich führten wir eine funktionelle Analyse durch, um neue Genziele, Enhancer-Pleiotropie und Mechanismen der Enhancer-Aktivität zu ermitteln. Insgesamt bringt diese Arbeit unser Verständnis der durch Enhancer vermittelten Regulierung der Genexpression in Gesundheit und Krankheit voran.Human genome contains millions of regulatory elements - enhancers - that quantitatively regulate gene expression. Multiple experimental and computational approaches were developed to associate enhancers with their gene targets. Despite the tremendous progress in understanding how enhancers tune gene expression, the field still lacks an approach that is systematic, integrative and accessible for discovering and documenting cis-regulatory relationships across the genome.
We developed a novel computational approach - reg2gene- that models and integrates gene expression ~ enhancer activity. reg2gene consists of three main steps: 1) data quantification, 2) data modelling and significance assessment, and 3) data integration gathered in the reg2gene R package. As a result we identified two sets of enhancer-gene associations (EGAs): the flexible set of ~230K EGAs (flexibleC), and the stringent set of ~60K EGAs (stringentC). We identified major differences across previously published computational models of enhancer-gene associations; mostly in the location, number and properties of defined enhancer regions and EGAs.
We performed detailed benchmarking of seven sets of computationally modelled EGAs, but showed that none of the currently available benchmark datasets could be used as a “golden-standard” benchmark dataset. To account for that observation, we defined an additional benchmark set of positive and negative EGAs with which we showed that the stringentC model had the highest positive predictive value (PPV) across all analyzed computational models. We reviewed the influence of EGA sets on the functional analysis of risk SNPs and demonstrated the potential of EGAs to identify gene targets of non-coding SNP-gene associations. Lastly, we performed a functional analysis to detect novel gene targets, enhancer pleiotropy, and mechanisms of enhancer activity. Altogether, this work advances our understanding of enhancer-mediated gene expression regulation in health and disease.Ljudski genom sadrži milijune regulatornih elemenata - enhancera - koji kvantitativno reguliraju ekspresiju gena. Unatoč ogromnom napretku u razumijevanju načina na koji enhanceri reguliraju ekspresiju gena, području još uvijek nedostaje pristup koji je sustavan, integrativan i dostupan za otkrivanje i dokumentiranje cis-regulatornih odnosa u cijelom genomu.
Razvili smo novu raÄŤunalnu metodu - reg2gene - koja modelira i integrira aktivnost enhancera~ekspresije gena. reg2gene sastoji se od tri glavna koraka: 1) kvantifikacija podataka, 2) modeliranje podataka i procjena znaÄŤaja, i 3) integracija podataka prikupljenih u reg2gene R paketu. Kao rezultat toga, identificirali smo dva skupa enhancer-gen interakcija (EGA): fleksibilni skup od ~ 230K EGA (flexibleC) i strogi skup od ~ 60K EGA (stringentC). Utvrdili smo velike razlike u prethodno objavljenim raÄŤunalnim modelima enhancer-gen interakcija; uglavnom u lokaciji, broju i svojstvima definiranih enhancera i EGA.
Izveli smo detaljno mjerenje performansi sedam skupova računalno modeliranih EGA-a, ali smo pokazali da se niti jedan od trenutno dostupnih skupova referentnih podataka ne može koristiti kao referentni skup podataka "zlatnI standard". Definirali smo dodatni referentni skup pozitivnih i negativnih EGA -a pomoću kojih smo pokazali da stringentC ima najveću pozitivnu prediktivnu vrijednost (PPV). Pokazali smo potencijal EGA-a za identifikaciju genskih meta nekodirajucih SNP-ova. Proveli smo funkcionalnu analizu kako bismo otkrili nove genske mete, pleiotropiju enhancera i mehanizme aktivnosti enhancera. Ovaj rad poboljšava naše razumijevanje regulacije ekspresije gena posredovane enhancerima
Recommended from our members
Meta-analysis of massively parallel reporter assays enables prediction of regulatory function across cell types.
Deciphering the potential of noncoding loci to influence gene regulation has been the subject of intense research, with important implications in understanding genetic underpinnings of human diseases. Massively parallel reporter assays (MPRAs) can measure regulatory activity of thousands of DNA sequences and their variants in a single experiment. With increasing number of publically available MPRA data sets, one can now develop data-driven models which, given a DNA sequence, predict its regulatory activity. Here, we performed a comprehensive meta-analysis of several MPRA data sets in a variety of cellular contexts. We first applied an ensemble of methods to predict MPRA output in each context and observed that the most predictive features are consistent across data sets. We then demonstrate that predictive models trained in one cellular context can be used to predict MPRA output in another, with loss of accuracy attributed to cell-type-specific features. Finally, we show that our approach achieves top performance in the Fifth Critical Assessment of Genome Interpretation "Regulation Saturation" Challenge for predicting effects of single-nucleotide variants. Overall, our analysis provides insights into how MPRA data can be leveraged to highlight functional regulatory regions throughout the genome and can guide effective design of future experiments by better prioritizing regions of interest
The Reasonable Effectiveness of Randomness in Scalable and Integrative Gene Regulatory Network Inference and Beyond
Gene regulation is orchestrated by a vast number of molecules, including transcription factors and co-factors, chromatin regulators, as well as epigenetic mechanisms, and it has been shown that transcriptional misregulation, e.g., caused by mutations in regulatory sequences, is responsible for a plethora of diseases, including cancer, developmental or neurological disorders. As a consequence, decoding the architecture of gene regulatory networks has become one of the most important tasks in modern (computational) biology. However, to advance our understanding of the mechanisms involved in the transcriptional apparatus, we need scalable approaches that can deal with the increasing number of large-scale, high-resolution, biological datasets. In particular, such approaches need to be capable of efficiently integrating and exploiting the biological and technological heterogeneity of such datasets in order to best infer the underlying, highly dynamic regulatory networks, often in the absence of sufficient ground truth data for model training or testing. With respect to scalability, randomized approaches have proven to be a promising alternative to deterministic methods in computational biology. As an example, one of the top performing algorithms in a community challenge on gene regulatory network inference from transcriptomic data is based on a random forest regression model. In this concise survey, we aim to highlight how randomized methods may serve as a highly valuable tool, in particular, with increasing amounts of large-scale, biological experiments and datasets being collected. Given the complexity and interdisciplinary nature of the gene regulatory network inference problem, we hope our survey maybe helpful to both computational and biological scientists. It is our aim to provide a starting point for a dialogue about the concepts, benefits, and caveats of the toolbox of randomized methods, since unravelling the intricate web of highly dynamic, regulatory events will be one fundamental step in understanding the mechanisms of life and eventually developing efficient therapies to treat and cure diseases
Evolutionary processes from the perspective of flowering time diversity.
Although it is well appreciated that genetic studies of flowering time regulation have led to fundamental advances in the fields of molecular and developmental biology, the ways in which genetic studies of flowering time diversity have enriched the field of evolutionary biology have received less attention despite often being equally profound. Because flowering time is a complex, environmentally responsive trait that has critical impacts on plant fitness, crop yield, and reproductive isolation, research into the genetic architecture and molecular basis of its evolution continues to yield novel insights into our understanding of domestication, adaptation, and speciation. For instance, recent studies of flowering time variation have reconstructed how, when, and where polygenic evolution of phenotypic plasticity proceeded from standing variation and de novo mutations; shown how antagonistic pleiotropy and temporally varying selection maintain polymorphisms in natural populations; and provided important case studies of how assortative mating can evolve and facilitate speciation with gene flow. In addition, functional studies have built detailed regulatory networks for this trait in diverse taxa, leading to new knowledge about how and why developmental pathways are rewired and elaborated through evolutionary time
Unraveling the transcriptional Cis-regulatory code
It is nowadays accepted that eukaryotic complexity is not dictated by the number of protein-coding genes of the genome, but rather achieved through the combinatorics of gene expression programs. Distinct aspects of the expression pattern of a gene are mediated by discrete regulatory sequences, known as cis-regulatory elements. The work described in this thesis was aimed at developing computational and statistical methods to guide the search and characterization of novel cis-regulatory elements
- …