297 research outputs found

    Advancing the analysis of bisulfite sequencing data in its application to ecological plant epigenetics

    Get PDF
    The aim of this thesis is to bridge the gap between the state-of-the-art bioinformatic tools and resources, currently at the forefront of epigenetic analysis, and their emerging applications to non-model species in the context of plant ecology. New, high-resolution research tools are presented; first in a specific sense, by providing new genomic resources for a selected non-model plant species, and also in a broader sense, by developing new software pipelines to streamline the analysis of bisulfite sequencing data, in a manner which is applicable to a wide range of non-model plant species. The selected species is the annual field pennycress, Thlaspi arvense, which belongs in the same lineage of the Brassicaceae as the closely-related model species, Arabidopsis thaliana, and yet does not benefit from such extensive genomic resources. It is one of three key species in a Europe-wide initiative to understand how epigenetic mechanisms contribute to natural variation, stress responses and long-term adaptation of plants. To this end, this thesis provides a high-quality, chromosome-level assembly for T. arvense, alongside a rich complement of feature annotations of particular relevance to the study of epigenetics. The genome assembly encompasses a hybrid approach, involving both PacBio continuous long reads and circular consensus sequences, alongside Hi-C sequencing, PCR-free Illumina sequencing and genetic maps. The result is a significant improvement in contiguity over the existing draft state from earlier studies. Much of the basis for building an understanding of epigenetic mechanisms in non-model species centres around the study of DNA methylation, and in particular the analysis of bisulfite sequencing data to bring methylation patterns into nucleotide-level resolution. In order to maintain a broad level of comparison between T. arvense and the other selected species under the same initiative, a suite of software pipelines which include mapping, the quantification of methylation values, differential methylation between groups, and epigenome-wide association studies, have also been developed. Furthermore, presented herein is a novel algorithm which can facilitate accurate variant calling from bisulfite sequencing data using conventional approaches, such as FreeBayes or Genome Analysis ToolKit (GATK), which until now was feasible only with specifically-adapted software. This enables researchers to obtain high-quality genetic variants, often essential for contextualising the results of epigenetic experiments, without the need for additional sequencing libraries alongside. Each of these aspects are thoroughly benchmarked, integrated to a robust workflow management system, and adhere to the principles of FAIR (Findability, Accessibility, Interoperability and Reusability). Finally, further consideration is given to the unique difficulties presented by population-scale data, and a number of concepts and ideas are explored in order to improve the feasibility of such analyses. In summary, this thesis introduces new high-resolution tools to facilitate the analysis of epigenetic mechanisms, specifically relating to DNA methylation, in non-model plant data. In addition, thorough benchmarking standards are applied, showcasing the range of technical considerations which are of principal importance when developing new pipelines and tools for the analysis of bisulfite sequencing data. The complete “Epidiverse Toolkit” is available at https://github.com/EpiDiverse and will continue to be updated and improved in the future.:ABSTRACT ACKNOWLEDGEMENTS 1 INTRODUCTION 1.1 ABOUT THIS WORK 1.2 BIOLOGICAL BACKGROUND 1.2.1 Epigenetics in plant ecology 1.2.2 DNA methylation 1.2.3 Maintenance of 5mC patterns in plants 1.2.4 Distribution of 5mC patterns in plants 1.3 TECHNICAL BACKGROUND 1.3.1 DNA sequencing 1.3.2 The case for a high-quality genome assembly 1.3.3 Sequence alignment for NGS 1.3.4 Variant calling approaches 2 BUILDING A SUITABLE REFERENCE GENOME 2.1 INTRODUCTION 2.2 MATERIALS AND METHODS 2.2.1 Seeds for the reference genome development 2.2.2 Sample collection, library preparation, and DNA sequencing 2.2.3 Contig assembly and initial scaffolding 2.2.4 Re-scaffolding 2.2.5 Comparative genomics 2.3 RESULTS 2.3.1 An improved reference genome sequence 2.3.2 Comparative genomics 2.4 DISCUSSION 3 FEATURE ANNOTATION FOR EPIGENOMICS 3.1 INTRODUCTION 3.2 MATERIALS AND METHODS 3.2.1 Tissue preparation for RNA sequencing 3.2.2 RNA extraction and sequencing 3.2.3 Transcriptome assembly 3.2.4 Genome annotation 3.2.5 Transposable element annotations 3.2.6 Small RNA annotations 3.2.7 Expression atlas 3.2.8 DNA methylation 3.3 RESULTS 3.3.1 Transcriptome assembly 3.3.2 Protein-coding genes 3.3.3 Non-coding loci 3.3.4 Transposable elements 3.3.5 Small RNA 3.3.6 Pseudogenes 3.3.7 Gene expression atlas 3.3.8 DNA Methylation 3.4 DISCUSSION 4 BISULFITE SEQUENCING METHODS 4.1 INTRODUCTION 4.2 PRINCIPLES OF BISULFITE SEQUENCING 4.3 EXPERIMENTAL DESIGN 4.4 LIBRARY PREPARATION 4.4.1 Whole Genome Bisulfite Sequencing (WGBS) 4.4.2 Reduced Representation Bisulfite Sequencing (RRBS) 4.4.3 Target capture bisulfite sequencing 4.5 BIOINFORMATIC ANALYSIS OF BISULFITE DATA 4.5.1 Quality Control 4.5.2 Read Alignment 4.5.3 Methylation Calling 4.6 ALTERNATIVE METHODS 5 FROM READ ALIGNMENT TO DNA METHYLATION ANALYSIS 5.1 INTRODUCTION 5.2 MATERIALS AND METHODS 5.2.1 Reference species 5.2.2 Natural accessions 5.2.3 Read simulation 5.2.4 Read alignment 5.2.5 Mapping rates 5.2.6 Precision-recall 5.2.7 Coverage deviation 5.2.8 DNA methylation analysis 5.3 RESULTS 5.4 DISCUSSION 5.5 A PIPELINE FOR WGBS ANALYSIS 6 THERE AND BACK AGAIN: INFERRING GENOMIC INFORMATION 6.1 INTRODUCTION 6.1.1 Implementing a new approach 6.2 MATERIALS AND METHODS 6.2.1 Validation datasets 6.2.2 Read processing and alignment 6.2.3 Variant calling 6.2.4 Benchmarking 6.3 RESULTS 6.4 DISCUSSION 6.5 A PIPELINE FOR SNP VARIANT ANALYSIS 7 POPULATION-LEVEL EPIGENOMICS 7.1 INTRODUCTION 7.2 CHALLENGES IN POPULATION-LEVEL EPIGENOMICS 7.3 DIFFERENTIAL METHYLATION 7.3.1 A pipeline for case/control DMRs 7.3.2 A pipeline for population-level DMRs 7.4 EPIGENOME-WIDE ASSOCIATION STUDIES (EWAS) 7.4.1 A pipeline for EWAS analysis 7.5 GENOTYPING-BY-SEQUENCING (EPIGBS) 7.5.1 Extending the epiGBS pipeline 7.6 POPULATION-LEVEL HAPLOTYPES 7.6.1 Extending the EpiDiverse/SNP pipeline 8 CONCLUSION APPENDICES A. SUPPLEMENT: BUILDING A SUITABLE REFERENCE GENOME B. SUPPLEMENT: FEATURE ANNOTATION FOR EPIGENOMICS C. SUPPLEMENT: FROM READ ALIGNMENT TO DNA METHYLATION ANALYSIS D. SUPPLEMENT: INFERRING GENOMIC INFORMATION BIBLIOGRAPH

    Evaluation of transgenic cassava expressing mismatch and non-mismatch hpRNA constructs derived from African cassava mosaic virus and South African cassava mosaic virus open reading frames

    Get PDF
    A thesis submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy in the School of Molecular and Cell Biology. Johannesburg, 2015.With rising global food prices, growing populations, climate change and future demand for tuber crops for feed and potential energy source, cassava is well positioned to meet the needs of many countries in the SADC region, including South Africa. However a major constraint to cassava cultivation is cassava infecting begomoviruses (CBVs), including African cassava mosaic virus (ACMV) and South African cassava mosaic virus (SACMV). ACMV and SACMV belong to the family Geminiviridae, comprising of circular single-stranded bipartite. Symptoms associated with CBVs infection include yellow and/or green mosaic, leaf deformation, leaf curling and stunted plant growth. Since no chemical control of virus diseases of plants is possible, one approach to develop virus resistance is via biotechnology, through genetic engineering (GE) of cassava to express hairpin RNA (hpRNA) silencing constructs against CBV. However cassava is recalcitrant and difficult to transform and regenerate. The aim of this study was to produce hpRNA/inverted repeat (IR) hpRNA constructs targeting ACMV AC1/4:AC2/3 open reading frames (ORF) and hpRNA targeting SACMV BC1 ORF to engineer hpRNA expressing transgenic cassava resistant to ACMV and SACMV. Furthermore, the approach was to stack two ACMV contiguous overlapping reading frames (AC1/4) and (AC2/3) in an attempt to improve resistance to CBV. However IR sequences are prone to unfavourable tight secondary structure formation known as cruciform structures. To circumvent this, one set of constructs (mutated sense-arm: mismatch constructs) were designed to contain sodium bisulfite deamination-induced mutations in the hairpin sense-arm making it less complementary to the antisense arm and therefore enhancing IR stability and cruciform junction formation. MM2hp (mismatch construct targeting ACMV AC1/4:AC2/3) and MM4hp (mismatch construct targeting SACMV BC1) were generated. The second construct set, non-mismatch: gateway, was designed based on the most currently used Gateway construct system. Gateway constructs contained an intron positioned between the IR fragments. MM6hp (non-mismatch construct targeting ACMV AC1/4:AC2/3) and MM6hp (non-mismatch construct targeting SACMV BC1) were generated. Similar to the deamination-induced mutations, the intron assisted with IR stability. ACMV- or SACMV-derived hpRNA constructs were transformed into model cassava cultivar cv.60444. Additionally, since few farmer-preferred cultivars or landraces have been transformed for resistance, South African high starch landrace T200 was also transformed with the hpRNA constructs. Agrobacterium-mediated transformation of friable embryogenic callus (FEC) was used and plants regenerated. Several transgenic cv.60444 and T200 lines were regenerated. Cassava landraces are generally less amenable to transformation however were able to report 79 % and 76 % for model cv.60444 and landrace T200, respectively. T200 transformation efficiency reported in this study is 43% higher than previously reported. This is also the first report of South African cassava landrace T200 transformation with ACMV and SACMV-derived hpRNA constructs. Transgenic lines were selected and infected with ACMV and SACMV infectious virus clones. Lines were then monitored at 12, 32 and 67 days post infection (dpi) for symptom development, plant growth and SACMV and ACMV viral load. At 67 dpi, a more significant difference between transgenic lines and untransformed infected cv.60444 was observed. At 67 dpi, 69 % and 75% of ACMV AC1/4:AC2/3 and SACMV BC1 transgenic lines, respectively, showed lower symptoms and reduced viral load compared to control susceptible wild-type cv.60444, but comparable to virus-challenged non-transgenic tolerant landrace control TME3. Notably, a lack of correlation between viral load and symptoms was not always observed. Plant to plant variation was observed between individual transgenic lines generated from each construct (MM2hp; MM4hp; MM6hp and MM8hp) transformation events (A-MM2, A-MM4, C-MM6 and C-MM8). However, overall a positive correlation between symptoms and viral load was observed for virus challenge trials of transgenic lines generated from A-MM4, C-MM6 and C-MM8 transformation events, this overall positive correlation was observed at all 3 dpi (12, 32 and 67 dpi). A number of ACMV and SACMV tolerant transgenic lines were obtained for both mismatch and non-mismatch hpRNA expressing transgenic lines, where virus replication persisted, but symptoms were lower at 67 dpi compared to non-transgenic plants. CBV tolerance levels observed in transgenic lines expressing mismatch technology hpRNA was not significantly different to CBV tolerance levels observed in transgenic lines expressing non-mismatch hpRNA. Expression of ACMV and SACMV- derived constructs generated tolerant cassava lines, where tolerance is defined as plants displaying virus replication but lower to no symptoms. In addition to this, a recovery phenotype was observed in five MM2hp (ACMV AC1/4:AC2/4)- derived hp expressing transgenic lines at 365 dpi, where recovery is defined as no to mild symptoms after an initial period of symptoms, and a reduction in or no viral load. In five MM4hp (SACMV BC1)-derived hpRNA expressing transgenic lines, complete recovery was observed at 365 dpi; no symptoms and no detectable virus. From this study we propose that expression of CBV- derived hpRNA targeting ACMV AC1/4:AC2/4 and SACMV BC1 in CBV susceptible cv.60444 enhances cv.60444 ACMV and SACMV tolerance. Mismatch (mutated sense-arm) construct technology offered tolerance levels comparable to the more conventional and more expensive non-mismatch (Gateway) technology. We therefore also propose that the use of mismatch hpRNA technology in cassava genetic engineering can be used as an alternative approach to transgenic crop production. Promising transgenic lines, showing moderate SACMV and ACMV resistance, were identified and these will be used in further trials as they could be considered favourable to farmers

    Metilação diferencial de DNA no envelhecimento: exploração in silico utilizando dados de elevado rendimento

    Get PDF
    The emergence of high-throughput methodologies after the conclusion of the Human Genome Project has brought genomic and epigenomic wide studies to the forefront of current research of biological and biomedical knowledge. Currently, the focus in genetic mutations as primary cause of certain disorders is not so relevant as before, since it was demonstrated that epigenetic mechanisms are involved in cellular programming and gene regulation providing adaptive variants of a given gene to a changing environment with an association to cellular differentiation. The research in the DNA methylation field has already revealed essential facts as the existence of methylation in CpG islands and alternative contexts that influence gene expression in tissue-specific manner. The influence of lifestyle choices in aging processes has also been related to methylome variations. And, in the case of cancer, the cooperation of epigenetic and genetic information is essential to understand the progress of cancer development as well as the silencing of key regulatory genes. An overall hypomethylation in cancer genome leads to oncogene activation whereas hypermethylation in specific regions is associated with silencing of tumour suppressor genes. For that reason, the research for new therapeutic approaches to cancer and aging is a current issue of the scientific community that work in the epigenomic field. In order to contribute to the study of mammalian epigenomes during lifespans, this research focused on the usage of public databases datasets to further investigation about DNA methylation across aged individuals in order to extract tissue-specific markers related with healthy aging. The validation of results was made through the usage of samples, form healthy individuals with good or bad cognitive performances, available in iBiMED. In both situations the genes ELOVL2 (cg16867657) and FHL2 (cg06639320) were identified as good markers of ageO aparecimento de metodologias de sequenciação de elevado rendimento após a conclusão do Projeto do Genoma Humano foi um avanço fundamental para a pesquisa biológica e biomédica na área da genómica. Embora as mutações genéticas tenham sido durante décadas o foco principal na causa de certas desordens, atualmente demonstrou-se que os mecanismos epigenéticos estão envolvidos na programação celular e na regulação genética, providenciando variações adaptativas do mesmo gene a um determinado ambiente e possuindo ainda uma associação direta com a diferenciação celular. O desenvolvimento científico no campo da metilação de DNA revela atualmente factos essenciais na biologia molecular, como a existência de metilação nas ilhas CpG e em contextos alternativos que influenciam a expressão genética nos diferentes tecidos humanos. Para além disso, a influência dos estilos de vida no processo de envelhecimento já demonstrou estar relacionada com o estado do epigenoma, nomeadamente com as variações no metiloma humano. No caso do cancro, a cooperação dos fatores genéticos e epigenéticos é essencial para a compreensão do desenvolvimento desta patologia no organismo humano nomeadamente através do silenciamento de genes reguladores essenciais. Uma hipometilação global no genoma do cancro conduz geralmente a uma ativação de oncogenes enquanto que hipermetilações localizadas estão associadas com o silenciamento de genes supressores de tumores. Por estes motivos, o desenvolvimento de novas terapias para o cancro ou o envelhecimento torna-se um tópico de interesse pela comunidade científica da área da epigenómica. Com o objetivo de desenvolver estes temas e melhorar a determinação de variações globais no epigenoma humano, esta investigação desenvolveu-se com base na utilização de dados de bases de dados públicas de indivíduos saudaveis de forma a extrair marcadores de metilação diferenciada em variados tecidos ao longo do envelhecimento saudável. O projeto foi validado através da utilização de amostras saúdaveis e de indivíduos com boas ou más performances cognitivas disponíveis no iBiMED. Em ambas as situações os genes ELOVL2 (cg16867657) e FHL2 (cg06639320) foram identificados como bons marcadores da idade dos indivíduosMestrado em Biotecnologi

    Conditional RNA interference, altered nuclear transfer and genome-wide DNA methylation analysis

    Get PDF
    The goal of the studies described here was to establish a set of methods aimed at ultimately enhancing the efficiency of epigenetic reprogramming. The use of RNA interference (RNAi) for studying and influencing gene function has become an essential part of biology. In this work, we have described two lentivirus-based vectors used for conditional, Cre-lox regulated RNAi in cells and in mice. One vector triggers Cre-dependent activation (pSico) and the other Cre-dependent termination (pSicoR) of shRNA expression. These vectors were used to conditionally and reversibly knock-down p53, Npm, and Dnmt1 expression in ES cells and in MEFs. As a proof of principle, pSico was used to generate conditional and tissue-specific knock-down mice. As outlined in Chapter 1 conditional depletion of various gene products by RNAi will provide better understanding of the factors involved in epigenetic reprogramming. This knowledge should ultimately lead to enhancing the efficiency of successful reprogramming. The pSicoR system was applied in later experiments to temporally suppress Cdx2 function in donor nuclei prior to nuclear transfer, a modification of the current procedure, termed altered nuclear transfer (ANT). Finally, deciphering the epigenome of different cell types is critical for understanding the regulation of both normal development and disease states. In the last part of this work we have devised a new strategy that permits high-resolution comparative DNA methylation analysis. The system was tested in ES cells depleted of DNA methylation by elimination of the DNA methyltransferases Dnmt1, Dnmt3a and 3b. Dnmt1 was knocked down in Dnmt3a and 3b double knockout ES cells using a pSicoR-Dnmt1 vector.Der Transfer eines differenzierten Zellkerns in eine entkernte Eizelle (Kerntransfer) ist einer von mehreren experimentellen Ansätzen um die Reprogrammierung von differenzierten Zellen zu erreichen. Unter Reprogrammierung vesteht man grundsätzlich die Erweiterung des Entwicklungspotentials einer differenzierten Zelle. Eines der Hauptziele von Kerntransfer-Experimenten ist es undifferenzierte Stamm- oder Vorläuferzellen hervorzubringen, welche für Zellersatztherapien genutzt werden können. Der Hauptvorteil von humanen Kerntransfer-Stammzellen liegt darin, dass sie patientenspezifisch sind und dadurch nicht vom Immunsystem als fremd erkannt würden. Einer der Schwerpunkte unserer gegenwärtigen Forschung ist es, ein besseres Verständnis der Mechanismen und Faktoren, die in der Kernreprogrammierung involviert sind, zu gewinnen. Die Tatsache, dass vollständig differenzierte Zellen mittels Kerntransfer reprogrammiert werden können, zeigt, das im Verlauf der Entwicklung keine genetische Information verloren geht, d.h. differenzierte Zellkerne enthalten sämtliche Informationen um einen kompletten Organismus hervorzubringen. Diese Ergebnisse deuten darauf hin, dass die Regulierung der Differenzierung über epigenetische Mechanismen gesteuert wird. Epigenetische Modifikationen sind stabile Veränderungen der DNA oder des Chromatins, die jedoch die primäre DNA Sequenz nicht verändern. Das Ziel meiner Untersuchen ist es, Methoden und ein besseres Verständnis der involvierten Faktoren zu entwickeln, um den Vorgang der Reprogrammierung verbessern zu können. Im ersten Teil meiner Arbeit beschreibe ich ein neues System zur Cre-Lox regulierbaren Gen Inhibierung durch RNA Interferenz. Die Effektivität und Funktionalität des Systems wurde für mehrere Gene in vitro und in vivo gezeigt. Neben vielen anderen nützlichen Anwendungen, wie konditionelle Regulierung von essentiellen Genen in vivo, was hier für das Tumorsuppressorgen p53 gezeigt wurde, erlaubt das System transiente Blockierung von Faktoren, die epigenetische Modifikationen regulieren, wie z.B. DNA Methyltransferase 1 (Dnmt1). Es konnte bereits gezeigt werden, dass eine Reduzierung der genomischen DNA Methylierung einen positiven Einfluss auf die Effizienz des Reprogrammierens durch Kerntransfer hat. Allerdings ergeben sich aus der daraus bedingten Hypomethylierung der DNA auch negative Konsequenzen, wie z.B. vermehrtes Auftreten von Tumoren. Diese negativen Auswirkungen lassen sich durch zeitlich beschränkte Inhibierung des Enzyms vermindern, da nach dem Entfernen des RNAi Systems das endogene Gen wieder aktiv ist. Im zweiten Teil beschreibe ich eine Modifikation der normalen Kerntransfer Technik. Dabei wird eine Gen, mittels des oben beschriebenen RNAi Systems blockiert, was unerlässlich ist für die Differenzierung in Trophectoderm, welches später die Plazenta formt. Dadurch wird kein funktioneller Embryo erzeugt, aber es lassen sich trotzdem Stammzellen gewinnen. Diese Experimente stellen eine wissenschaftliche Basis für die Diskussion über die Gewinnung von Stammzellen dar, und erlauben ausserdem weitere Analysen von essentiellen Faktoren die für die extraembryonalen Gewebe notwendig sind. Dies ist wichtig, da viele essentielle Gene im geklonten Embryo selbst, aber auch in seinen extraembryonalen Teilen, nicht korrekt reaktiviert werden. Obwohl die DNA Sequenz zwischen Stammzellen und differenzierten Zellen identisch ist, sind sie epigentisch verschiedenen. Um diese Unterschiede im gesamten Genom besser untersuchen zu können, haben wir eine Methode entwickelt, die es erlaubt grosse Teile des Epigenoms zu analysieren und zwischen verschiedenen Zelltypen zu vergleichen

    Engineering virus resistant transgenic cassava: the design of long hairpin RNA constructs against South African cassava mosaic virus

    Get PDF
    ABSTRACT Cassava is currently the second most important source of carbohydrates on the African continent. In the last two decades, cassava crops have been severely affected by outbreaks of cassava mosaic disease (CMD). South African cassava mosaic virus (SACMV) has been associated with CMD outbreaks in the Mpumalanga province. Advances in post-transcriptional gene silencing (PTGS) technology have provided promising new strategies for the engineering of virus resistance in plants. Inverted repeat (IR) constructs are currently the most potent inducers of PTGS, however, these constructs are inherently unstable. The purpose of this study was to develop IR constructs with an improved stability for the efficient induction of PTGS in plants. Two mismatched inverted repeat constructs, one targeting the SACMV BC1 open reading frame, the other targeting the Maize streak virus (MSV) AC1 open reading frame, were successfully created. Sodium bisulfite was used to deaminate cytosine residues on the sense arm of the constructs. The resulting number of GT mismatches was seemingly sufficient to stabilize the linear conformation of the IR constructs, as they were efficiently propagated by E.coli DH5!, and subsequently behaved like linear DNA molecules. Furthermore, it was found that the number of mismatches on the BC1 construct (17.5%) was ideal, as the subsequent stability of the predicted RNA hairpin was not affected. Due to the higher number of mismatches on the AC1 construct (23.5%), it was found that the loop region of the RNA hairpin was marginally destabilized. Despite this, long stretches of stable dsRNA were still produced from the AC1 IR construct, and is likely to induce PTGS. Interestingly, it was observed that the mismatched IR constructs, although still replicated in E.coli, were marginally destabilized in Agrobacterium. Therefore, it was deduced that the stability of a mismatched IR construct may be influenced by the particular intracellular environment of an organism. Due to the recalcitrance of cassava to transformation, a model plant system, Nicotiana benthamiana, was used to screen constructs for toxicity, stability, and efficiency of PTGS induction. Agrobacteriummediated transformation and regeneration of N. benthamiana was optimized, and 86% transformation efficiency was achieved when using leaf disk explants. It was found that the addition of an ethylene scrubber, potassium permanganate, substantially increased the rate of regeneration by reducing the frequency of hyperhydritic plants. Transgene iv integration was confirmed by PCR amplification of the hptII gene in the T-DNA region. Transgene expression was confirmed by screening for GUS and GFP reporter genes. No toxic responses to the transgene have been observed thus far. Studies are currently underway to confirm the stability of the mismatched IR constructs in N. benthamiana. PAGE Northern blotting is being done, as the detection of siRNAs derived from the transgene will confirm that constructs are functional. In addition, infectivity assays are underway to determine the efficacy of BC1 knockdown by a stably integrated construct. Due to the enhanced stability of mismatched IR constructs, they may be an appealing alternative to currently available intron-spliced, or exact matched hairpin systems

    A High-Throughput Approach to Uncover Novel Roles of APOBEC2, a Functional Orphan of the AID/APOBEC Family

    Get PDF
    APOBEC2 is a member of the AID/APOBEC cytidine deaminase family of proteins. Unlike most of AID/APOBEC, however, APOBEC2’s function remains elusive. Previous research has implicated APOBEC2 in diverse organisms and cellular processes such as muscle biology (in Mus musculus), regeneration (in Danio rerio), and development (in Xenopus laevis). APOBEC2 has also been implicated in cancer. However the enzymatic activity, substrate or physiological target(s) of APOBEC2 are unknown. For this thesis, I have combined Next Generation Sequencing (NGS) techniques with state-of-the-art molecular biology to determine the physiological targets of APOBEC2. Using a cell culture muscle differentiation system, and RNA sequencing (RNA-Seq) by polyA capture, I demonstrated that unlike the AID/APOBEC family member APOBEC1, APOBEC2 is not an RNA editor. Using the same system combined with enhanced Reduced Representation Bisulfite Sequencing (eRRBS) analyses I showed that, unlike the AID/APOBEC family member AID, APOBEC2 does not act as a 5-methyl-C deaminase. Finally, using a combination of biochemical, Chromatin Immunoprecipitation Sequencing (ChiP-Seq) and polyA RNA-Seq analyses I show that APOBEC2 is a (negative) regulator of gene expression (at least in muscle cells) and binds chromatin directly to inhibit transcription of genes involved in muscle cell differentiation. While the precise mechanism behind this activity is still a matter of investigation, this role of APOBEC2 in inhibiting genes involved in cell cycle exit, might have implications for its role in in cancer

    The impact of paternal metabolic health on sperm DNA methylation and fetal growth

    Get PDF
    Low birth weight is associated with cardiovascular disease and T2DM in later life. Paternal obesity and T2DM have been associated with an increased risk of fathering low birthweight offspring. Obesity is associated with epigenetic changes in blood, but few studies have replicated DNA methylation differences found in obese subjects. Animal studies have shown that obesity and insulin resistance are associated with DNA methylation changes in sperm, which in turn could mediate intergenerational effects. Such findings are lacking in humans. My PhD explored the association between paternal metabolic traits and the birth weight of his offspring. I then investigated whether DNA methylation signatures in spermatozoa of obese fathers could underlie any observed association with his offspring birthweight. First, I performed a prospective cohort study of 500 mother-father-offspring trios to identify paternal metabolic traits associated with an increased risk of fathering low birth weight offspring. Out of 390 trios, including 64 obese men and 48 growth restricted offspring, I did not discover any significant paternal metabolic traits associated with fathering low-birthweight offspring. However, I found that paternal (own) birth weight is associated with the birth weight of his offspring. This suggests that paternal genetic factors are more influential in determining his offspring’s growth in utero than are factors acquired during his lifetime. Second, I performed a systematic review of studies that had investigated DNA methylation in human sperm. From this review, I summarised current knowledge and generated recommendations for future research. I then performed the largest characterisation of matched human sperm and blood samples to date using the most comprehensive DNA methylation profiling array, the MethylationEPIC array. Results showed that the DNA methylomes of sperm and blood are highly discordant and in effect completely uncorrelated. Future studies of intergenerational effects will have to study germ cells, rather than blood. Lastly, I attempted to validate previously-identified DNA methylation signatures associated with male obesity. Despite comparing 96 well-characterised obese men with 96 lean men, I was unable to replicate any previously identified differentially methylated CpG sites associated with obesity, in their blood. In a linear regression model, I identified two CpG sites, cg07037944 and cg26651978, as being suggestive of an association with BMI. These results will contribute to a larger cohort study of 1000 obese and 1000 lean men that aims to identify a robust and reproducible DNA methylation profile associated with obesity. In conclusion, this thesis did not prove my pre-determined hypotheses. However, it does present findings which advance our understanding of the intriguing possibility that acquired parental metabolic phenotype may influence offspring birthweight through intergenerational inheritance of epigenetic marks

    Applied Bioinformatics for ncRNA Characterization - Case Studies Combining Next Generation Sequencing & Genomics

    Get PDF
    Non-coding RNAs (ncRNAs) present a diverse class of functional molecules inherent in virtually all forms of cellular life. Besides the canonical protein-encoding mRNAs the role of these abundant transcripts has been overlooked for decades. Defined by their highly conserved structure ncRNAs are resistant to degradation and perform various regulatory functions. Despite the poor sequence conservation, comparative genomics can be employed to identify homologous ncRNAs based on their structure in related species. Through the availability of next generation sequencing techniques, a rich corpus of datasets is available which grants a detailed look into cellular processes. The combination of genomic and transcriptomic data allows for a detailed understanding of molecular mechanism as well as characterization of individual gene functions and their evolution. However, analytical processing of modern high-throughput data is only made viable through optimized bioinformatic algorithms and reproducible automation pipelines. This thesis consists of four major parts highlighting the diverse roles of ncRNAs concerning the transcription process viewed from different vantage points. The first part concerns an unusually long untranslated region in Rhodobacter which harbors a ncRNA that regulates the expression of the downstream division cell wall cluster. Second, the degradation of 6S RNA in Bacillus subtilis is experimentally reconstructed to shed light on this final part of the RNA life cycle. This ncRNA is ubiquitous among bacteria and known to be a global transcription regulator itself. Next, the focus moves to the eukaryotic system and RNase P, an ancient ribozyme that is involved in tRNA maturation. Due to differences in composition with an optional RNA and multiple protein subunits, its phylogenetic distribution and deviant characteristics throughout the eukaryotic lineage are examined in order to trace its evolution. Finally, a diverse subgroup of non-translated RNAs are circRNAs which recently received increased attention due to their abundance in neural tissue. Resulting from post-transcriptional back-splicing events circRNAs compete with their host gene for expression. In a zoological study of social insects circRNA were for the first time identified in honeybees. The goal was to find task-related differences in circRNA expression between nurse bees and foragers and thus pinpoint potential functions of these elusive ncRNAs. The combination of genomic methods and transcriptomic data makes in-depth functional analysis of ncRNAs possible and enables us to understand the molecular mechanisms on multiple levels. Through structural predictions a riboswitch like transcriptional control of UpsM was revealed that is unique to Rhodobacteraceae. Transcriptomic analysis exposed that 6S RNA is primarily processed by RNase J1 for maturation and degraded at internal loops by RNase Y. Evolutionary comparison of organellar RNase P revealed that the RNA subunit is potentially less conserved than thought while organellar proteinonly variants are widespread potentially due to horizontal gene transfer. In the case of circRNA, an entire group of ncRNAs was characterized in the social model organism of honeybees and evidence of at least one gene where circRNA levels are significantly reduced during nurse-to-forager transition could be shown. Moreover, an unexpected link between elevated DNA methylation and RNA circularization was discovered. The bioinformatic findings in all of these cases provide a foundation for further experimental research and illustrate how scientific endeavors cannot be automated completely but require rigorous investigation with customized tools

    Studies on sequencing analyses of genetic and epigenetics features in melanoma and breast cancer

    Get PDF
    The dissertation includes 3 projects and in each work we applied different approaches to sequencing and bioinformatics analyses to gain a better understanding of the molecular characteristics of breast cancer and melanoma. In the first project (paper I) we applied whole exome sequencing to samples from patients with metastatic melanoma. We assessed intra patient heterogeneity and we identified several general patterns of tumor evolution in this malignancy. In the second project (paper II) we used promoter methylation-specific sequencing and analysed the variation of promoter methylation of tumor suppressors in healthy individuals. As such, we also established a cost-effective method to study promoter methylation as a potential modulator of cancer risk. In the third project (paper III), we used microRNA sequencing and identified novel miRNAs that were overexpressed in breast cancer patients. Two of these were selected for further investigation focusing on their potential biological roles in breast cancer.Doktorgradsavhandlin

    A Genome-Wide Characterization of Differentially Expressed Genes Encoding mRNAs and miRNAs and Methylation Analysis of Phytochrome Genes in a Cotton Phytochrome A1 RNAi line

    Get PDF
    Silencing phytochrome A1 gene (PHYA1) by RNA interference in upland cotton (Gossypium hirsutum L. cv. Coker 312) had generated PHYA1 RNAi lines with increased fiber length, strength and low micronair (finer fiber). In order to identify and characterize mRNAs and miRNAs that are differentially expressed in the RNAi plants, transcriptome and miRNAome analyses via high-throughput RNA sequencing were performed. Total RNA isolated from 10-DPA (days post anthesis) fibers and small RNAs isolated from 5-, 10-, and 15-DPA fibers of RNAi and Coker 312 lines were used to construct 6 RNA libraries and 18 small RNA libraries, respectively, which were sequenced using the Illumina HiSeq system. A total of 142 differentially expressed genes (DEGs) were identified in PHYA1 RNAi compared to Coker 312. GO analysis showed that these DEGs were mainly involved in metabolic pathways, binding and regulating enzymes (hydrolase, transferase, and oxidoreductase activities), and cell structures which were reported to play important roles in fiber development. Twenty-eight KEGG pathways were mapped for 142 DEGs, and the pathways related to glycolysis/gluconeogenesis and pyruvate metabolism were the most abundant, followed by cytochrome P450-involved pathways. Sixty-one conserved miRNA families and thirtyive novel miRNAs were identified in upland cotton. The targets of 6 conserved miRNAs, which expressed differentially in the RNAi line, were reported to participate in primary cell wall synthesis and phytohormone signaling pathways. The 35 novel miRNAs were identified in cotton for the first time, and their target genes were predicted. Nine novel miRNAs were identified to target cytochrome P450 TBP. Together, the results imply that miRNAs involved in fine-tune gene regulation might confer to the phenotype of the RNAi line with improved fiber quality. Besides characterizing mRNAs and miRNAs, the CpG site methylation status within coding regions of phytochrome genes in RNAi line in leaves and 10-DPA fibers was determined using bisulfite genomic sequencing. The PHYA1, PHYC and PHYE in RNAi line had higher methylation levels in leaves than those in Coker 312, but PHYB had lower methylation levels. In fibers, the methylation levels of PHYB also decreased in RNAi plants. However, the methylation of other phytochrome genes showed no significant changes
    corecore