14 research outputs found
Recommended from our members
Enhancer evolution in the Drosophila montium subgroup
Enhancers drive spatiotemporal patterns of gene expression, and play critical roles in development, disease, and evolution. Decades of research have yielded key insights, but many questions remain unanswered. A hallmark of enhancer evolution is functional conservation in the presence of extensive sequence divergence. However, identifying important mutational events between divergent sequences has been challenging. To overcome this challenge, I adopted a comparative genomic approach: sequence and assemble dozens of closely related species, and study enhancer evolution at the earliest stages of divergence. Such a data set provides an unprecedented opportunity to identify key changes and events (along with their context) before they are obscured by additional mutations. I started by sequencing and assembling 23 genomes from the Drosophila montium subgroup, a large group of closely related species. I also aligned each montium assembly to the extensively annotated D. melanogaster genome. The average scaffold NG50 is 76 kb, but varies widely (400 - 19 kb) depending on repeat content and heterozygosity levels. Despite large differences in contiguity, all montium assemblies contain high percentages of known genes and enhancers - demonstrating their suitably for this comparative genomic approach. To support my subsequent analyses, I also reconstructed the montium subgroup phylogeny using 20 Bicoid-dependent enhancers.Next, I leveraged this new genomic resource to study enhancer evolution across 24 montium species and D. melanogaster. I started with the extensively characterized eve stripe 2 enhancer, and showed how patterns of (apparent) conservation and variation could be used to direct targeted mutagenesis experiments, and to inform models of enhancer grammar. To study binding site turnover on a large scale, I investigated hundreds of ChIP peaks for the transcription factors Bicoid, Krüppel, and Zelda. I treated groups of orthologous binding site scores as continuous traits, reconstructed ancestral scores at each node of the species tree, and then calculated score changes along each branch of the tree. For all three factors, binding sites were more likely to be gained along branches of the tree that also lost a binding site. This was true for both conserved and non-conserved sites, and most differences were statistically significant. However, I observed similar patterns when I repeated the analyses using shuffled matrices, leaving me unable to conclude these were meaningful changes in transcription factor binding. Future analyses will focus on mitigating the effects of several confounding factors, including non-functional montium sequences, the forced gradualism of the Brownian motion model, and ancestral character estimation with a single species tree in the presence of widespread incomplete lineage sorting and / or introgression.Finally, in collaboration with Carolyn Elya and Michael Eisen, I worked on assembling the genome of the Drosophila-manipulating fungus Entomophthora muscae ‘Berkeley’. This is an excellent system with which to study the mechanistic basis of parasite-induced manipulations. Infected flies exhibit a suite of behavioral changes, including summit disease, proboscis extension / attachment, and raised / spread wings. Compared to most previously sequenced fungi, the genome is extremely large and repetitive. The total scaffold length is 1.24 Gb, but the haploid genome size might be around 650 Mb. Polyploidy appears to be common among related entomopathogenic fungi, so estimating the haploid genome size in the absence of additional experimental data is challenging. At least 85 % of the genome is repeats. In fact, the genome is so repeat-rich that aligning any pair of scaffolds produces characteristic X-alignments, where the forward strand of the first scaffold also aligns to the reverse complement of the second scaffold. The assembly appears to be missing many known fungal genes, but the significance of this is unclear. For genes that are present, the genome often appears to contain two distinct haplotypes. In many cases these haplotypes were assembled independently on different scaffolds, but many were also collapsed into single sequences. The alignment of PacBio long-reads to the assembly suggests that it contains numerous mis-assemblies. This was probably unavoidable given the genome’s dense repeat structure. Future efforts will focus on improving the assembly. Going forward, the E. muscae ‘Berkeley’ genome will support our efforts to understand the molecular basis of fungal-induced behavioral manipulations in D. melanogaster
Recommended from our members
Enhancer evolution in the Drosophila montium subgroup
Enhancers drive spatiotemporal patterns of gene expression, and play critical roles in development, disease, and evolution. Decades of research have yielded key insights, but many questions remain unanswered. A hallmark of enhancer evolution is functional conservation in the presence of extensive sequence divergence. However, identifying important mutational events between divergent sequences has been challenging. To overcome this challenge, I adopted a comparative genomic approach: sequence and assemble dozens of closely related species, and study enhancer evolution at the earliest stages of divergence. Such a data set provides an unprecedented opportunity to identify key changes and events (along with their context) before they are obscured by additional mutations. I started by sequencing and assembling 23 genomes from the Drosophila montium subgroup, a large group of closely related species. I also aligned each montium assembly to the extensively annotated D. melanogaster genome. The average scaffold NG50 is 76 kb, but varies widely (400 - 19 kb) depending on repeat content and heterozygosity levels. Despite large differences in contiguity, all montium assemblies contain high percentages of known genes and enhancers - demonstrating their suitably for this comparative genomic approach. To support my subsequent analyses, I also reconstructed the montium subgroup phylogeny using 20 Bicoid-dependent enhancers.Next, I leveraged this new genomic resource to study enhancer evolution across 24 montium species and D. melanogaster. I started with the extensively characterized eve stripe 2 enhancer, and showed how patterns of (apparent) conservation and variation could be used to direct targeted mutagenesis experiments, and to inform models of enhancer grammar. To study binding site turnover on a large scale, I investigated hundreds of ChIP peaks for the transcription factors Bicoid, Krüppel, and Zelda. I treated groups of orthologous binding site scores as continuous traits, reconstructed ancestral scores at each node of the species tree, and then calculated score changes along each branch of the tree. For all three factors, binding sites were more likely to be gained along branches of the tree that also lost a binding site. This was true for both conserved and non-conserved sites, and most differences were statistically significant. However, I observed similar patterns when I repeated the analyses using shuffled matrices, leaving me unable to conclude these were meaningful changes in transcription factor binding. Future analyses will focus on mitigating the effects of several confounding factors, including non-functional montium sequences, the forced gradualism of the Brownian motion model, and ancestral character estimation with a single species tree in the presence of widespread incomplete lineage sorting and / or introgression.Finally, in collaboration with Carolyn Elya and Michael Eisen, I worked on assembling the genome of the Drosophila-manipulating fungus Entomophthora muscae ‘Berkeley’. This is an excellent system with which to study the mechanistic basis of parasite-induced manipulations. Infected flies exhibit a suite of behavioral changes, including summit disease, proboscis extension / attachment, and raised / spread wings. Compared to most previously sequenced fungi, the genome is extremely large and repetitive. The total scaffold length is 1.24 Gb, but the haploid genome size might be around 650 Mb. Polyploidy appears to be common among related entomopathogenic fungi, so estimating the haploid genome size in the absence of additional experimental data is challenging. At least 85 % of the genome is repeats. In fact, the genome is so repeat-rich that aligning any pair of scaffolds produces characteristic X-alignments, where the forward strand of the first scaffold also aligns to the reverse complement of the second scaffold. The assembly appears to be missing many known fungal genes, but the significance of this is unclear. For genes that are present, the genome often appears to contain two distinct haplotypes. In many cases these haplotypes were assembled independently on different scaffolds, but many were also collapsed into single sequences. The alignment of PacBio long-reads to the assembly suggests that it contains numerous mis-assemblies. This was probably unavoidable given the genome’s dense repeat structure. Future efforts will focus on improving the assembly. Going forward, the E. muscae ‘Berkeley’ genome will support our efforts to understand the molecular basis of fungal-induced behavioral manipulations in D. melanogaster
Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses.
Large groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium species group, 22 of which are presented here for the first time. The montium group is well-positioned for clade genomics. Within the montium clade, evolutionary distances are such that large numbers of sequences can be accurately aligned while also recovering strong signals of divergence; and the distance between the montium group and D. melanogaster is short enough so that orthologous sequence can be readily identified. All genomes were assembled from a single, small-insert library using MaSuRCA, before going through an extensive post-assembly pipeline. Estimated genome sizes within the montium group range from 155 Mb to 223 Mb (mean = 196 Mb). The absence of long-distance information during the assembly process resulted in fragmented assemblies, with the scaffold NG50s varying widely based on repeat content and sample heterozygosity (min = 18 kb, max = 390 kb, mean = 74 kb). The total scaffold length for most assemblies is also shorter than the estimated genome size, typically by 5-15%. However, subsequent analysis showed that our assemblies are highly complete. Despite large differences in contiguity, all assemblies contain at least 96% of known single-copy Dipteran genes (BUSCOs, n = 2,799). Similarly, by aligning our assemblies to the D. melanogaster genome and remapping coordinates for a large set of transcriptional enhancers (n = 3,457), we showed that each montium assembly contains orthologs for at least 91% of D. melanogaster enhancers. Importantly, the genic and enhancer contents of our assemblies are comparable to that of far more contiguous Drosophila assemblies. The alignment of our own D. serrata assembly to a previously published PacBio D. serrata assembly also showed that our longest scaffolds (up to 1 Mb) are free of large-scale misassemblies. Our genome assemblies are a valuable resource that can be used to further resolve the montium group phylogeny; study the evolution of protein-coding genes and cis-regulatory sequences; and determine the genetic basis of ecological and behavioral adaptations
Recommended from our members
Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses.
Large groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium species group, 22 of which are presented here for the first time. The montium group is well-positioned for clade genomics. Within the montium clade, evolutionary distances are such that large numbers of sequences can be accurately aligned while also recovering strong signals of divergence; and the distance between the montium group and D. melanogaster is short enough so that orthologous sequence can be readily identified. All genomes were assembled from a single, small-insert library using MaSuRCA, before going through an extensive post-assembly pipeline. Estimated genome sizes within the montium group range from 155 Mb to 223 Mb (mean = 196 Mb). The absence of long-distance information during the assembly process resulted in fragmented assemblies, with the scaffold NG50s varying widely based on repeat content and sample heterozygosity (min = 18 kb, max = 390 kb, mean = 74 kb). The total scaffold length for most assemblies is also shorter than the estimated genome size, typically by 5-15%. However, subsequent analysis showed that our assemblies are highly complete. Despite large differences in contiguity, all assemblies contain at least 96% of known single-copy Dipteran genes (BUSCOs, n = 2,799). Similarly, by aligning our assemblies to the D. melanogaster genome and remapping coordinates for a large set of transcriptional enhancers (n = 3,457), we showed that each montium assembly contains orthologs for at least 91% of D. melanogaster enhancers. Importantly, the genic and enhancer contents of our assemblies are comparable to that of far more contiguous Drosophila assemblies. The alignment of our own D. serrata assembly to a previously published PacBio D. serrata assembly also showed that our longest scaffolds (up to 1 Mb) are free of large-scale misassemblies. Our genome assemblies are a valuable resource that can be used to further resolve the montium group phylogeny; study the evolution of protein-coding genes and cis-regulatory sequences; and determine the genetic basis of ecological and behavioral adaptations
Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses
Large groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium species group, 22 of which are presented here for the first time. The montium group is well-positioned for clade genomics. Within the montium clade, evolutionary distances are such that large numbers of sequences can be accurately aligned while also recovering strong signals of divergence; and the distance between the montium group and D. melanogaster is short enough so that orthologous sequence can be readily identified. All genomes were assembled from a single, small-insert library using MaSuRCA, before going through an extensive post-assembly pipeline. Estimated genome sizes within the montium group range from 155 Mb to 223 Mb (mean = 196 Mb). The absence of long-distance information during the assembly process resulted in fragmented assemblies, with the scaffold NG50s varying widely based on repeat content and sample heterozygosity (min = 18 kb, max = 390 kb, mean = 74 kb). The total scaffold length for most assemblies is also shorter than the estimated genome size, typically by 5–15%. However, subsequent analysis showed that our assemblies are highly complete. Despite large differences in contiguity, all assemblies contain at least 96% of known single-copy Dipteran genes (BUSCOs, n = 2,799). Similarly, by aligning our assemblies to the D. melanogaster genome and remapping coordinates for a large set of transcriptional enhancers (n = 3,457), we showed that each montium assembly contains orthologs for at least 91% of D. melanogaster enhancers. Importantly, the genic and enhancer contents of our assemblies are comparable to that of far more contiguous Drosophila assemblies. The alignment of our own D. serrata assembly to a previously published PacBio D. serrata assembly also showed that our longest scaffolds (up to 1 Mb) are free of large-scale misassemblies. Our genome assemblies are a valuable resource that can be used to further resolve the montium group phylogeny; study the evolution of protein-coding genes and cis-regulatory sequences; and determine the genetic basis of ecological and behavioral adaptations
A phylogeny for the Drosophila montium species group: A model clade for comparative analyses
The Drosophila montium species group is a clade of 94 named species, closely related to the model species D. melanogaster. The montium species group is distributed over a broad geographic range throughout Asia, Africa, and Australasia. Species of this group possess a wide range of morphologies, mating behaviors, and endosymbiont associations, making this clade useful for comparative analyses. We use genomic data from 42 available species to estimate the phylogeny and relative divergence times within the montium species group, and its relative divergence time from D. melanogaster. To assess the robustness of our phylogenetic inferences, we use 3 non-overlapping sets of 20 single-copy coding sequences and analyze all 60 genes with both Bayesian and maximum likelihood methods. Our analyses support monophyly of the group. Apart from the uncertain placement of a single species, D. baimaii, our analyses also support the monophyly of all seven subgroups proposed within the montium group. Our phylograms and relative chronograms provide a highly resolved species tree, with discordance restricted to estimates of relatively short branches deep in the tree. In contrast, age estimates for the montium crown group, relative to its divergence from D. melanogaster, depend critically on prior assumptions concerning variation in rates of molecular evolution across branches, and hence have not been reliably determined. We discuss methodological issues that limit phylogenetic resolution - even when complete genome sequences are available - as well as the utility of the current phylogeny for understanding the evolutionary and biogeographic history of this clade
Data from: RRapid global spread of wRi-like Wolbachia across multiple Drosophila
Maternally transmitted Wolbachia, Spiroplasma and Cardinium bacteria are common in insects, but their interspecific spread is poorly understood. Endosymbionts can spread rapidly within host species by manipulating host reproduction, as typified by the global spread of wRi Wolbachia observed in Drosophila simulans. However, because Wolbachia cannot survive outside host cells, spread between distantly related host species requires horizontal transfers that are presumably rare. Here we document spread of wRi-like Wolbachia among eight highly diverged Drosophila hosts (10–50 million years) over only about 14,000 years (5,000–27,000). Comparing 110 wRi-like genomes, we find ≤0.02% divergence from the wRi variant that spread rapidly through California populations of D. simulans. The hosts include both globally invasive species, D. simulans, D. suzukii and D. ananassae , and narrowly distributed Australian endemics, D. anomalata and D. pandora. Phylogenetic analyses that include mtDNA genomes indicate introgressive transfer of wRi-like Wolbachia between closely related species D. ananassa e , D. anomalata and D. pandora , but no horizontal transmission within species. Our analyses suggest D. ananassae as the Wolbachia source for the recent wRi invasion of D. simulans, and D. suzukii as the source of Wolbachia in its sister species D. subpulchrella . Although six of these wRi-like variants cause strong cytoplasmic incompatibility, two cause no detectable reproductive effects, indicating that pervasive mutualistic effects complement the reproductive manipulations for which Wolbachia are best known. “Super spreader” variants like wRi may be particularly useful for controlling insect pests and vector-borne diseases with Wolbachia transinfections
Data from: RRapid global spread of wRi-like Wolbachia across multiple Drosophila
Maternally transmitted Wolbachia, Spiroplasma and Cardinium bacteria are common in insects, but their interspecific spread is poorly understood. Endosymbionts can spread rapidly within host species by manipulating host reproduction, as typified by the global spread of wRi Wolbachia observed in Drosophila simulans. However, because Wolbachia cannot survive outside host cells, spread between distantly related host species requires horizontal transfers that are presumably rare. Here we document spread of wRi-like Wolbachia among eight highly diverged Drosophila hosts (10–50 million years) over only about 14,000 years (5,000–27,000). Comparing 110 wRi-like genomes, we find ≤0.02% divergence from the wRi variant that spread rapidly through California populations of D. simulans. The hosts include both globally invasive species, D. simulans, D. suzukii and D. ananassae , and narrowly distributed Australian endemics, D. anomalata and D. pandora. Phylogenetic analyses that include mtDNA genomes indicate introgressive transfer of wRi-like Wolbachia between closely related species D. ananassa e , D. anomalata and D. pandora , but no horizontal transmission within species. Our analyses suggest D. ananassae as the Wolbachia source for the recent wRi invasion of D. simulans, and D. suzukii as the source of Wolbachia in its sister species D. subpulchrella . Although six of these wRi-like variants cause strong cytoplasmic incompatibility, two cause no detectable reproductive effects, indicating that pervasive mutualistic effects complement the reproductive manipulations for which Wolbachia are best known. “Super spreader” variants like wRi may be particularly useful for controlling insect pests and vector-borne diseases with Wolbachia transinfections