13 research outputs found
Pattern of DNA methylation in daphnia : evolutionary perspective
DNA methylation is an evolutionary ancient epigenetic modification that is phylogenetically widespread. Comparative studies of the methylome across a diverse range of non-conventional and conventional model organisms is expected to help reveal how the landscape of DNA methylation and its functions have evolved. Here, we explore the DNA methylation profile of two species of the crustacean Daphnia using whole genome bisulfite sequencing. We then compare our data with the methylomes of two insects and two mammals to achieve a better understanding of the function of DNA methylation in Daphnia. Using RNA-sequencing data for all six species, we investigate the correlation between DNA methylation and gene expression. DNA methylation in Daphnia is mainly enriched within the coding regions of genes, with the highest methylation levels observed at exons 2-4. In contrast, vertebrate genomes are globally methylated, and increase towards the highest methylation levels observed at exon 2, and maintained across the rest of the gene body. Although DNA methylation patterns differ among all species, their methylation profiles share a bimodal distribution across the genomes. Genes with low levels of CpG methylation and gene expression are mainly enriched for species specific genes. In contrast, genes associated with high methylated CpG sites are highly transcribed and evolutionary conserved across all species. Finally, the positive correlation between internal exons and gene expression potentially points to an evolutionary conserved mechanism, whereas the negative regulation of gene expression via methylation of promoters and exon 1 is potentially a secondary mechanism that has been evolved in vertebrates
Early transcriptional response pathways in Daphnia magna are coordinated in networks of crustacean-specific genes
Natural habitats are exposed to an increasing number of environmental stressors that cause important ecological consequences. However, the multifarious nature of environmental change, the strength and the relative timing of each stressor largely limit our understanding of biological responses to environmental change. In particular, early response to unpredictable environmental change, critical to survival and fitness in later life stages, is largely uncharacterized. Here, we characterize the early transcriptional response of the keystone species Daphnia magna to twelve environmental perturbations, including biotic and abiotic stressors. We first perform a differential expression analysis aimed at identifying differential regulation of individual genes in response to stress. This preliminary analysis revealed that a few individual genes were responsive to environmental perturbations and they were modulated in a stressor and genotype-specific manner. Given the limited number of differentially regulated genes, we were unable to identify pathways involved in stress response. Hence, to gain a better understanding of the genetic and functional foundation of tolerance to multiple environmental stressors, we leveraged the correlative nature of networks and performed a weighted gene co-expression network analysis. We discovered that approximately one-third of the Daphnia genes, enriched for metabolism, cell signalling and general stress response, drives transcriptional early response to environmental stress and it is shared among genetic backgrounds. This initial response is followed by a genotype- and/or condition-specific transcriptional response with a strong genotype-by-environment interaction. Intriguingly, genotype- and condition-specific transcriptional response is found in genes not conserved beyond crustaceans, suggesting niche-specific adaptation
Recommended from our members
Large-Scale Interpretable Multi-View Learning for Very High-Dimensional Problems with Application to Multi-Omic Data
We discuss the sparse Canonical Correlation Analysis (CCA) problem in the context of high-dimensional multi-view problems, where we aim to discover interpretable association structures among multiple random vectors via their respective views with an emphasis on setting where the number of observations is too few compared to the number of covariates. Throughout this text, we use the term view define as observations of a random vector on an ordered set of subjects, which is the same for observations of all other random vectors involved in the analysis. We denote each view by Xi ∈ R n×pi , i = 1, . . . , m, where m is the number of random vectors, or equivalently number of views. In the first two chapters we consider linear association structures shared among multiple views, where the objective is to learn sparse linear combinations of multiple sets of covariates such that they are maximally correlated. In the first chapter we introduce a new approach to the sparse CCA, where we learn the sparsity pattern of the canonical directions in the first stage by casting this problem as two successively shrinking concave minimization programs which are solved via a first-order algorithm, and in the second stage we solve a small CCA problem by considering the sparsity patterns estimated in the first stage. We demonstrate via simulations that, in comparison to other available methods, our approach demonstrates superior convergence properties and capability to recover the underlying sparsity patterns and the magnitudes of the non-zero elements of the canonical directions, as well as, significantly lower computational cost. We then apply our method to a multi-omic environmental genetics study on fruit flies, where we hypothesise about the mechanism of adaptation of this model organism to environmental pesticides.In the second chapter we tackle a shared short-coming of sparse PCA and sparse CCA methods, which is that, in case of estimating multiple components or canonical directions for each view, these directions are not orthogonal to each other, which diminishes interpretability. While all other approaches estimate canonical directions one-by-one via the contraction scheme, we offer a block scheme where we estimate the first d canonical directions simultaneously. In this setting, we can more easily impose orthogonality, and also encourage disjoint sets of non-zero elements within multiple directions, resulting in more interpretable models. We also extended our model to what we call sparse Directed CCA, where we use an accessory variable, defined in the text, to try to capture variations related to a certain hypothesis, rather than the dominant variations which might be proven irrelevant to the main hypothesis. As a validating example, we apply our method to the lung cancer multi-omics available on The Cancer Genome Atlas, using survival data as our accessory variable. While regular sparse CCA exclusively identified correlation structures dominated by and communities separated by gender, our directed sparse CCA correctly identified two underlying communities which were significantly separated by survival.In the final chapter, we generalize our framework to discover non-linear association structures by proposing a two-stage sparse kernel CCA algorithm. We learn maximally aligned kernels in the first stage via sparse Multiple Kernel Learning (MKL), and then solve a KCCA problem in the second stage using learned kernels. We perform sparse MKL by forming an alignment matrix where its elements are the sample Hilbert Schmidt Independence Criterion of base kernels of pairs of views. These base kernels are functions of small sets of covariates of each view; therefore our sparse MKL approach provides interpretable solutions, as sparse convex linear combinations of base kernels. We finally provide an Apache Spark implementation of our methods introduced throughout the dissertation which makes users capable of running our methods on very high-dimensional datasets, e.g. observations on millions of Single Nucleotide Polymorphism loci, using distributed computing. We call this package SparKLe.R versions of our algorithms are also available. MuLe, BLOCCS, and SparKLe-R implements our methods presented in Chapters 1,2, and 3, respectively
Recommended from our members
Pattern of DNA Methylation in Daphnia: Evolutionary Perspective.
DNA methylation is an evolutionary ancient epigenetic modification that is phylogenetically widespread. Comparative studies of the methylome across a diverse range of non-conventional and conventional model organisms is expected to help reveal how the landscape of DNA methylation and its functions have evolved. Here, we explore the DNA methylation profile of two species of the crustacean Daphnia using whole genome bisulfite sequencing. We then compare our data with the methylomes of two insects and two mammals to achieve a better understanding of the function of DNA methylation in Daphnia. Using RNA-sequencing data for all six species, we investigate the correlation between DNA methylation and gene expression. DNA methylation in Daphnia is mainly enriched within the coding regions of genes, with the highest methylation levels observed at exons 2-4. In contrast, vertebrate genomes are globally methylated, and increase towards the highest methylation levels observed at exon 2, and maintained across the rest of the gene body. Although DNA methylation patterns differ among all species, their methylation profiles share a bimodal distribution across the genomes. Genes with low levels of CpG methylation and gene expression are mainly enriched for species specific genes. In contrast, genes associated with high methylated CpG sites are highly transcribed and evolutionary conserved across all species. Finally, the positive correlation between internal exons and gene expression potentially points to an evolutionary conserved mechanism, whereas the negative regulation of gene expression via methylation of promoters and exon 1 is potentially a secondary mechanism that has been evolved in vertebrates
Recommended from our members
Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy.
Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-out data results from heterogeneity of functional signatures in enhancer elements. We show that at least two classes of enhancers are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, greater than 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well-predicted elements is composed predominantly of enhancers driving multistage segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naïve Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome. An analysis of 32 SDEs using whole-mount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements
Recommended from our members
Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy.
Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-out data results from heterogeneity of functional signatures in enhancer elements. We show that at least two classes of enhancers are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, greater than 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well-predicted elements is composed predominantly of enhancers driving multistage segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naïve Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome. An analysis of 32 SDEs using whole-mount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements