215 research outputs found

    Associative Pattern Recognition for Biological Regulation Data

    Get PDF
    In the last decade, bioinformatics data has been accumulated at an unprecedented rate, thanks to the advancement in sequencing technologies. Such rapid development poses both challenges and promising research topics. In this dissertation, we propose a series of associative pattern recognition algorithms in biological regulation studies. In particular, we emphasize efficiently recognizing associative patterns between genes, transcription factors, histone modifications and functional labels using heterogeneous data sources (numeric, sequences, time series data and textual labels). In protein-DNA associative pattern recognition, we introduce an efficient algorithm for affinity test by searching for over-represented DNA sequences using a hash function and modulo addition calculation. This substantially improves the efficiency of \textit{next generation sequencing} data analysis. In gene regulatory network inference, we propose a framework for refining weak networks based on transcription factor binding sites, thus improved the precision of predicted edges by up to 52%. In histone modification code analysis, we propose an approach to genome-wide combinatorial pattern recognition for histone code to function associative pattern recognition, and achieved improvement by up to 38.1%38.1\%. We also propose a novel shape based modification pattern analysis approach, using this to successfully predict sub-classes of genes in flowering-time category. We also propose a combination to combination associative pattern recognition, and achieved better performance compared against multi-label classification and bidirectional associative memory methods. Our proposed approaches recognize associative patterns from different types of data efficiently, and provides a useful toolbox for biological regulation analysis. This dissertation presents a road-map to associative patterns recognition at genome wide level

    Integrating Epigenetic Priors For Improving Computational Identification of Transcription Factor Binding Sites

    Get PDF
    Transcription factors and histone modifications play critical roles in tissue-specific gene expression. Identifying binding sites is key in understanding the regulatory interactions of gene expression. Nave computational approaches uses solely DNA sequence data to construct models known as Position Weight Matrices. However, the various assumptions and the lack of background genomic information leads to a high false positive rate. In an attempt to improve the predictive performance of a PWM, we use a Hidden Markov Model to incorporate chromatin structure, in particular histone modifications. The HMM captures physical interactions between distinct HMs. Indeed, the integration of sequence based PWM models and chromatin modifications improve the predictive ability of the integrative model

    Computational micromodel for epigenetic mechanisms

    Get PDF
    Definition and characterization of the role of Epigenetic mechanisms have gained immense momentum since the completion of the Human Genome Project. The human epigenetic layer, made up of DNA methylation and multiple histone protein modifications, (the key elements of epigenetic mechanisms), is known to act as a switchboard that regulates the occurrence of most cellular events. In multicellular organisms such as humans, all cells have identical genomic contents but vary in DNA Methylation (DM) profile with the result that different types of cells perform a spectrum of functions. DM within the genome is associated with tight control of gene expression, parental imprinting, X-chromosome inactivation, long-term silencing of repetitive elements and chromatin condensation. Recently, considerable evidence has been put forward to demonstrate that environmental stress implicitly alters normal interactions among key epigenetic elements inside the genome. Aberrations in the spread of DM especially hypo/hyper methylation supported by an abnormal landscape of histone modifications have been strongly associated with Cancer initiation and development. While new findings on the impact of these key elements are reported regularly, precise information on how DM is controlled and its relation to networks of histone modifications is lacking. This has motivated modelling of DNA methylation and histone modifications and their interdependence. We describe initial computational methods used to investigate these key elements of epigenetic change, and to assess related information contained in DNA sequence patterns. We then describe attempts to develop a phenomenological epigenetic "micromodel", based on Markov-Chain Monte Carlo principles. This theoretical micromodel ("EpiGMP") aims to explore the effect of histome modifications and gene expression for defined levels of DNA methylation. We apply this micromodel to (i) test networks of genes in colon cancer (extracted from an in-house database, StatEpigen), and (ii) to help define an agent-based modelling framework to explore chromatin remodelling (or the dynamics of physical rearrangements), inside the human genome. Parallelization techniques to address issues of scale during the application of this micromodel have been adopted as well. A generic tool of this kind can potentially be applied to predict molecular events that affect the state of expression of any gene during the onset or progress of cancer. Ultimately, the goal is to provide additional information on ways in which these low level molecular changes determine physical traits for mormal and disease conditions in an organism

    Interplay of genetic, epigenetic and transcription factors in the regulation of transcriptional variation in Plasmodium falciparum

    Full text link
    [eng] The most severe form of malaria, caused by Plasmodium falciparum parasites, still kills over half a million people every year, most of them children under the age of five. Despite huge research efforts, reduction in the global burden of disease has stalled in recent years. P. falciparum has a very complex life cycle including, among other steps, sexual reproduction in female Anopheles mosquitos and an asexual intra-erythoricitic development cycle (IDC) inside the human host, which causes the disease. During the IDC, the parasite needs to continuously adapt to changes in its environment including fluctuations in blood temperature, concentration of nutrients and other metabolites, presence of drugs, and a constant fight against the host’s immune system. In this thesis, we have studied the adaptation mechanisms of P. falciparum to this plethora of challenges, with a special focus on clonally variant genes (CVGs). In P. falciparum, CVGs are a set of genes, participating in host-parasite interactions, which can be found both in a transcriptionally active state, characterized by euchromatin, or a transcriptionally silenced state, characterized by heterochromatin. The state of CVGs is inherited by the progeny of a parasite, with stochastic switches occurring at a low frequency. Parasites with the most optimal patterns of CVGs expression are continuously selected as the environment changes, leading to adaptation and survival of the infecting population. In the first paper of this thesis, we have analyzed subcloned parasite populations to characterize, with unprecedented detail, the heterochromatin distribution associated with the active and silenced states of CVGs. This has allowed us to define different kinds of heterochromatin transitions between the active and silenced states of CVGs and has given us new insights on the regulation of var genes (one of the main virulence factors for malaria) and into the regulation of sexual conversion, a process crucial for malaria transmission. Continuing with CVG regulation, in the second paper of the thesis, we have analyzed how patterns of CVG expression are established at the onset of human infections, after passage through transmission stages. Our results suggest a loss of the epigenetic memory during transmission stages and a reset of the heterochromatin patterns that drive CVG expression. Similar patterns of CVG expression arose in different infected individuals, suggesting that the activation probability of a given CVG is an intrinsic property of the gene. In the third paper of the thesis, we have further studied the sexual conversion phenomenon. We have generated a conditional over-expression system for pfap2-g, the CVG that acts as master regulator of sexual conversion, achieving sexual conversion rates of ~90% after induction. Our results have provided new insights on how heterochromatin at different positions affects expression of pfap2-g and have allowed us to characterize the transcriptional profile of the initial stages of sexual commitment with unprecedented sensitivity. Finally, in the fourth paper of this thesis, we have studied the adaptation of the parasite to heat-shock, which happens in natural infections due to fever episodes. We expected CVGs to participate in this phenomenon, but instead we have identified pfap2-hs, a non-clonally variant transcription factor (TF), as the main driver of the heat-shock response in P. falciparum. AP2-HS acts as the functional homolog of HSF1 (a TF that drives the heat-shock response from yeast to mammals, but is absent in P. falciparum), driving a very tight transcriptional response to heat-shock, characterized by the up-regulation of hsp70 and hsp90. Although the presence of directed responses had previously been demonstrated for other cues, it is the first time that the transcription factor driving such a response is identified in P. falciparum. Taken together, the results of this thesis have broadened our knowledge of the regulation of adaptive mechanisms in P. falciparum. Learning about this deadly parasite’s defense mechanisms will be instrumental to design better strategies to fight it back in the future

    Interplay of genetic, epigenetic and transcription factors in the regulation of transcriptional variation in Plasmodium falciparum

    Get PDF
    Programa de Doctorat en Biomedicina / Tesi realitzada a l'Institut de Salut Global de Barcelona (ISGlobal)[eng] The most severe form of malaria, caused by Plasmodium falciparum parasites, still kills over half a million people every year, most of them children under the age of five. Despite huge research efforts, reduction in the global burden of disease has stalled in recent years. P. falciparum has a very complex life cycle including, among other steps, sexual reproduction in female Anopheles mosquitos and an asexual intra-erythoricitic development cycle (IDC) inside the human host, which causes the disease. During the IDC, the parasite needs to continuously adapt to changes in its environment including fluctuations in blood temperature, concentration of nutrients and other metabolites, presence of drugs, and a constant fight against the host’s immune system. In this thesis, we have studied the adaptation mechanisms of P. falciparum to this plethora of challenges, with a special focus on clonally variant genes (CVGs). In P. falciparum, CVGs are a set of genes, participating in host-parasite interactions, which can be found both in a transcriptionally active state, characterized by euchromatin, or a transcriptionally silenced state, characterized by heterochromatin. The state of CVGs is inherited by the progeny of a parasite, with stochastic switches occurring at a low frequency. Parasites with the most optimal patterns of CVGs expression are continuously selected as the environment changes, leading to adaptation and survival of the infecting population. In the first paper of this thesis, we have analyzed subcloned parasite populations to characterize, with unprecedented detail, the heterochromatin distribution associated with the active and silenced states of CVGs. This has allowed us to define different kinds of heterochromatin transitions between the active and silenced states of CVGs and has given us new insights on the regulation of var genes (one of the main virulence factors for malaria) and into the regulation of sexual conversion, a process crucial for malaria transmission. Continuing with CVG regulation, in the second paper of the thesis, we have analyzed how patterns of CVG expression are established at the onset of human infections, after passage through transmission stages. Our results suggest a loss of the epigenetic memory during transmission stages and a reset of the heterochromatin patterns that drive CVG expression. Similar patterns of CVG expression arose in different infected individuals, suggesting that the activation probability of a given CVG is an intrinsic property of the gene. In the third paper of the thesis, we have further studied the sexual conversion phenomenon. We have generated a conditional over-expression system for pfap2-g, the CVG that acts as master regulator of sexual conversion, achieving sexual conversion rates of ~90% after induction. Our results have provided new insights on how heterochromatin at different positions affects expression of pfap2-g and have allowed us to characterize the transcriptional profile of the initial stages of sexual commitment with unprecedented sensitivity. Finally, in the fourth paper of this thesis, we have studied the adaptation of the parasite to heat-shock, which happens in natural infections due to fever episodes. We expected CVGs to participate in this phenomenon, but instead we have identified pfap2-hs, a non-clonally variant transcription factor (TF), as the main driver of the heat-shock response in P. falciparum. AP2-HS acts as the functional homolog of HSF1 (a TF that drives the heat-shock response from yeast to mammals, but is absent in P. falciparum), driving a very tight transcriptional response to heat-shock, characterized by the up-regulation of hsp70 and hsp90. Although the presence of directed responses had previously been demonstrated for other cues, it is the first time that the transcription factor driving such a response is identified in P. falciparum. Taken together, the results of this thesis have broadened our knowledge of the regulation of adaptive mechanisms in P. falciparum. Learning about this deadly parasite’s defense mechanisms will be instrumental to design better strategies to fight it back in the future

    Die gegenseitige Beeinflussung synthetischer Expressionskassetten in definierten chromosomalen Abschnitten

    Get PDF
    Heterogeneity in transgene expression is frequently observed upon genetic modification of cells in basic research and biotechnology. The variable transgene expression is considered to be a result of the crosstalk of the incoming promoter cassette with cis-acting elements associated with the chromosomal site of transgene integration (position effect). Targeted integration of the transgene into the open chromatin reduces variability due to chromosomal position effects and also favors the more predictable transgene expression. The objective of this study was to have a mechanistic understanding of the nature of interaction that occurs between the transgenic/synthetic cassettes when integrated into defined chromosomal sites. To this end CMV driven transgene expression were investigated in more than 100 independent cell clones in two different cell lines (CHO and HEK293T). Not only was the transgene expression highly variable among clones but also large levels of heterogeneity in expression existed within clones with metastable phenotype. This was correlated with differential and dynamic chromatin conformation related to differential histone modifications. In addition, a CMV based Tetracycline inducible synthetic promoter (BiTet) was evaluated in the well-characterized Rosa26 locus.The epigenetic status of the promoter cassette was evaluated in mouse ES cells and transgenic mice to investigate the mechanisms mediating the interaction between the transgene and the chromosomal loci that results in variation of transgene expression. Contrary to the expectation, even upon targeting the ubiquitous Rosa26 locus, the expression of the synthetic cassette driven by tetracycline inducible synthetic promoters (BiTet) was highly heterogeneous. However in this analysis the endogenous Rosa26 locus largely remained methylation free. While DNA methylation was the major player in the silencing of the Tetracycline based promoter systems in the Rosa26 site, the heterogeneity associated with the hCMV driven constructs in random CHO and HEK293T clones was entirely associated with distinct histone modifications causing variable transgene expression and transgene silencing. While the heterogeneous expression was found to be associated with different chromatin states conferred by various epigenetic markings, the stability and extent of the variation in transgene expression may largely depend on the nature of crosstalk between the chromosomal integration site and the synthetic construct.Genetisch manipulierte Zellen werden oft in der Biotechnologie und der Grundlagenforschung verwendet. In diesen genetisch manipulierten Zellen kommt es oft zu unerwĂŒnschten heterogenen Transgen-Expressionen, die oft ein Nebeneffekt von sogenannten Positionseffekten sind. Diese Positionseffekte entstehen durch die Interaktion der Promoter-Kassetten mit den im Genom codierten Cis agierenden Elementen, was zu unterschiedlichen Transgenexpressionen fĂŒhrt. Die gezielte Integration eines Transgens in einen offenen Chromatin-Abschnitt reduziert den sogenannten Positionseffekt und fĂŒhrt zu einer vorhersagbareren Expression.Das Ziel dieser Arbeit ist die Untersuchung von transgenen/synthetischen Kassetten an verschiedenen chromosomalen Integrations-Orten um die sogenannten Positionseffekte und den chromosomalen "crosstalk" genauer zu charakterisieren. Von zwei verschiedenen Zelllinien (CHO und Hek293T) wurden 100 Klone analysiert. Diese Klone wurden auf die Expression der integrierten Transgen-Kassette ĂŒberprĂŒft. Hierbei wurde das Transgen von einem CMV Promoter exprimiert. Zum Einen zeigte die Analyse der Zellen, dass die einzelnen Klone mit der gleichen Transgenkassette an verschiedenen Integrationsorten unterschiedliche Expressionsmuster aufwiesen (interklonale Expressionsunterschiede). Interressanterweise Ă€nderten sich zum Anderen diese Expressionsmuster der analysierten Klone nach mehrmaligen passagieren (Intraklonale Expressionsunterschiede). Das heisst, dass Zellen mit der gleichen Transgenkassette an dem gleichen Integrationsort ihr Expressionsmuster verĂ€ndern. Diese intraklonalen Unterschiede wurden als metastabil bezeichnet. Die unterschiedlich exprimierenden Zellen in den metastabilen Klonen wurden genauer charakterisiert. Hierbei korrelierten die verschiedenen Expressionsmuster mit den unterschiedlichen Histon Modifikationen und somit mit den chromatin Konformationen. ZusĂ€tzlich wurde ein CMV basierter synthetischer Promoter, Tetracycline abhĂ€ngiger Promoter (BiTet), in einem gut charakterisierten Lokus, dem R26 Lokus, analysiert. Obwohl der Promoter (BiTet) in diesem ubiquitĂ€r aktiven Lokus integriert wurde, zeigte die Expressionsanalyse, sowohl in Maus-Stammzellen als auch in transgenen MĂ€usen, ĂŒberraschenderweise eine heterogene Expression. Anhand der epigenetischen Analysen konnte gezeigt werden, dass der Bitet Promoter methyliert wird, wobei der endogene R26 Promoter hauptsĂ€chlich frei von Methylierungen bleibt. WĂ€hrend das Silencing der Bitet Kassette in dem R26 Lokus und somit die heterogene Transgenexpression hauptsĂ€chlich auf die Methylierung der DNA zurĂŒckzufĂŒhren ist, ist das Silencing der analysierten CHO und HEK293T Klone, die zufĂ€llig im Genom integrierten, ein Resultat der Histon Modifikationen. Nichts desto trotz, wĂ€hrend die HeterogenitĂ€t der Transgenexpression von verschiedenen Chromatin-ZustĂ€nden, die ĂŒber bestimmte epigenetische Markierungnen etabliert werden, abhĂ€ngt, ist die StabilitĂ€t und die StĂ€rke der heterogenen Expression abhĂ€ngig von der Interaktion des synthetischen integrierten Konstrukts und des chromosomalen Integrations-Ortes

    Epigenomic And Nuclear Architectural Insights Into Rett Syndrome

    Get PDF
    The importance of DNA methylation in neuronal function is highlighted by mutations in the neuronally enriched “reader” of DNA methylation, methyl-CpG-binding protein 2 (MECP2), causing Rett Syndrome (RTT), a severe neurodevelopmental disorder. Although MeCP2 displays broad genomic binding, gene expression changes in Mecp2 mutant mice are very subtle, and brain region-specific, making it difficult to determine how MeCP2 regulates gene expression. Therefore, we developed an approach to assess cell type-specific effects of Mecp2 mutations on the transcriptome, epigenome, and chromatin architecture to determine whether epigenomic features can explain gene misregulation in RTT. Differentially expressed genes (DEGs) in R106W Mecp2 mutants (R106W) are enriched for MeCP2 binding in the WT setting and are preferentially demethylated in R106W, suggesting that the loss of MeCP2 binding results in the exposure of unbound cytosines to demethylation, thus contributing to gene dysregulation. Given that DEGs are enriched for MeCP2 binding, we next determined unique features of DEGs to gain an understanding of why MeCP2 preferentially targets DEGs. We find that DEGs are cell type-specific, lowly expressed, and intragenically associated with heterochromatin, active enhancer, and CTCF chromatin states, suggesting that MeCP2 is essential for the regulation of lowly expressed genes. Upregulated and downregulated DEGs are differentially enriched for particular chromatin states, providing an insight into the directionality of gene dysregulation. Given the enrichment of DEGs for active enhancer and CTCF chromatin states, we next investigated transcription factor (TF) footprints and found thousands of altered TF footprints in R106W, with the CTCF motif being the most significantly associated. In WT, these sites are enriched for MeCP2 binding, and in R106W, these sites, which are associated with downregulated DEGs, become demethylated, enabling CTCF binding. This therefore suggests that MeCP2 can affect CTCF recruitment to chromatin. Given CTCF’s known role in chromatin organization, we employed Oligopaint and found large-scale condensation of euchromatin and heterochromatin, as well as decondensation of long genes. Together, this work provides insight into why DEGs are differentially susceptible to dysregulation in RTT and posits MeCP2 as a key player in global maintenance of the methylome and chromatin architecture for the preservation of neuronal gene expression

    A highly condensed genome without heterochromatin : orchestration of gene expression and epigenomics in Paramecium tetraurelia

    Get PDF
    Epigenetic regulation in unicellular ciliates can be as complex as in metazoans and is well described regarding small RNA (sRNA) mediated effects. The ciliate Paramecium harbors several copies of sRNA-biogenesis related proteins involved in genome rearrangements resulting in chromatin alterations. The global chromatin organization thereby is poorly understood, and unusual characteristics of the somatic nucleus, like high polyploidy, high genome coding density, and absence of heterochromatin, ought to call for complex regulation to orchestrate gene expression. The present study characterized the nucleosomal organization required for gene regulation and proper Polymerase II activity. Histone marks reveal broad domains in gene bodies, whereas intergenic regions are nucleosome free. Low occupancy in silent genes suggests that gene inactivation does not involve nucleosome recruitment. Thus, Paramecium gene regulation counteracts the current understanding of chromatin biology. Apart from global nucleosome studies, two sRNA binding proteins (Ptiwis) classically associated with transposon silencing were investigated in the background of transgene-induced silencing. Surprisingly, both Ptiwis also load sRNAs from endogenous loci in vegetative growth, revealing a broad diversity of Ptiwi functions. Together, the studies enlighten epigenetic mechanisms that regulate gene expression in a condensed genome, with Ptiwis contributing to transcriptome and chromatin dynamics.Epigenetische Regulation kann in einzelligen Ciliaten so komplex sein wie in Vielzellern und wurde umfassend angesichts kleiner RNA (sRNA)-vermittelter Effekte untersucht. Der Ciliat Paramecium besitzt mehrere Kopien sRNA-Biogenese assoziierter Proteine, die an Genomprozessierungen und resultierenden ChromatinĂ€nderungen beteiligt sind. Die globale Organisation des Chromatins ist dabei kaum verstanden und obskure Eigenschaften des somatischen Kerns, wie hohe Polyploidie, Kodierungsdichte und Fehlen von Heterochromatin, sollten eine komplexe Regulation zur Steuerung der Genexpression erfordern. Die vorliegende Studie charakterisiert die Chromatinorganisation, die fĂŒr die Genregulation und Polymerase II AktivitĂ€t notwendig ist. Histonmodifikationen zeigen breite Verteilungen in Genen, wĂ€hrend intergenische Regionen Nukleosomen-frei sind. Ein Stilllegen von Genen scheint ohne die Rekrutierung von Nukleosomen zu erfolgen, womit die Genregulation in Paramecium dem aktuellen VerstĂ€ndnis der Chromatinbiologie widerspricht. Neben Nukleosomenstudien wurden zwei sRNA-bindende Proteine (Ptiwis), die klassisch mit Transposon-Silencing assoziiert sind, im Hintergrund des Transgeninduzierten Silencings untersucht. Überraschenderweise laden Ptiwis sRNAs von endogenen Loci im vegetativen Wachstum, was vielfĂ€ltige Ptiwi-Funktionen offenbart. Die Studien zeigen epigenetische Mechanismen zur Genregulation in einem kompakten Genom, wobei Ptiwis zur Transkriptom- und Chromatindynamik beitragen
    • 

    corecore