761 research outputs found

    Neutral genomic microevolution of a recently emerged pathogen, salmonella enterica serovar agona

    Get PDF
    Salmonella enterica serovar Agona has caused multiple food-borne outbreaks of gastroenteritis since it was first isolated in 1952. We analyzed the genomes of 73 isolates from global sources, comparing five distinct outbreaks with sporadic infections as well as food contamination and the environment. Agona consists of three lineages with minimal mutational diversity: only 846 single nucleotide polymorphisms (SNPs) have accumulated in the non-repetitive, core genome since Agona evolved in 1932 and subsequently underwent a major population expansion in the 1960s. Homologous recombination with other serovars of S. enterica imported 42 recombinational tracts (360 kb) in 5/143 nodes within the genealogy, which resulted in 3,164 additional SNPs. In contrast to this paucity of genetic diversity, Agona is highly diverse according to pulsed-field gel electrophoresis (PFGE), which is used to assign isolates to outbreaks. PFGE diversity reflects a highly dynamic accessory genome associated with the gain or loss (indels) of 51 bacteriophages, 10 plasmids, and 6 integrative conjugational elements (ICE/IMEs), but did not correlate uniquely with outbreaks. Unlike the core genome, indels occurred repeatedly in independent nodes (homoplasies), resulting in inaccurate PFGE genealogies. The accessory genome contained only few cargo genes relevant to infection, other than antibiotic resistance. Thus, most of the genetic diversity within this recently emerged pathogen reflects changes in the accessory genome, or is due to recombination, but these changes seemed to reflect neutral processes rather than Darwinian selection. Each outbreak was caused by an independent clade, without universal, outbreak-associated genomic features, and none of the variable genes in the pan-genome seemed to be associated with an ability to cause outbreaks

    Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

    An enigmatic fourth runt domain gene in the fugu genome: ancestral gene loss versus accelerated evolution

    Get PDF
    BACKGROUND: The runt domain transcription factors are key regulators of developmental processes in bilaterians, involved both in cell proliferation and differentiation, and their disruption usually leads to disease. Three runt domain genes have been described in each vertebrate genome (the RUNX gene family), but only one in other chordates. Therefore, the common ancestor of vertebrates has been thought to have had a single runt domain gene. RESULTS: Analysis of the genome draft of the fugu pufferfish (Takifugu rubripes) reveals the existence of a fourth runt domain gene, FrRUNT, in addition to the orthologs of human RUNX1, RUNX2 and RUNX3. The tiny FrRUNT packs six exons and two putative promoters in just 3 kb of genomic sequence. The first exon is located within an intron of FrSUPT3H, the ortholog of human SUPT3H, and the first exon of FrSUPT3H resides within the first intron of FrRUNT. The two gene structures are therefore "interlocked". In the human genome, SUPT3H is instead interlocked with RUNX2. FrRUNT has no detectable ortholog in the genomes of mammals, birds or amphibians. We consider alternative explanations for an apparent contradiction between the phylogenetic data and the comparison of the genomic neighborhoods of human and fugu runt domain genes. We hypothesize that an ancient RUNT locus was lost in the tetrapod lineage, together with FrFSTL6, a member of a novel family of follistatin-like genes. CONCLUSIONS: Our results suggest that the runt domain family may have started expanding in chordates much earlier than previously thought, and exemplify the importance of detailed analysis of whole-genome draft sequence to provide new insights into gene evolution

    Transposon-Mediated Horizontal Transfer of the Host-Specific Virulence Protein ToxA between Three Fungal Wheat Pathogens

    Get PDF
    Most known examples of horizontal gene transfer (HGT) between eukaryotes are ancient. These events are identified primarily using phylogenetic methods on coding regions alone. Only rarely are there examples of HGT where noncoding DNA is also reported. The gene encoding the wheat virulence protein ToxA and the surrounding 14 kb is one of these rare examples. ToxA has been horizontally transferred between three fungal wheat pathogens (Parastagonospora nodorum, Pyrenophora tritici-repentis, and Bipolaris sorokiniana) as part of a conserved ∼14 kb element which contains coding and noncoding regions. Here we used long-read sequencing to define the extent of HGT between these three fungal species. Construction of near-chromosomal-level assemblies enabled identification of terminal inverted repeats on either end of the 14 kb region, typical of a type II DNA transposon. This is the first description of ToxA with complete transposon features, which we call ToxhAT. In all three species, ToxhAT resides in a large (140-to-250 kb) transposon-rich genomic island which is absent in isolates that do not carry the gene (annotated here as toxa−). We demonstrate that the horizontal transfer of ToxhAT between P. tritici-repentis and P. nodorum occurred as part of a large (∼80 kb) HGT which is now undergoing extensive decay. In B. sorokiniana, in contrast, ToxhAT and its resident genomic island are mobile within the genome. Together, these data provide insight into the noncoding regions that facilitate HGT between eukaryotes and into the genomic processes which mask the extent of HGT between these species.M.C.M. acknowledges The Sun Foundation’s Peer Prize for Women in Science for support to sequence additional ToxA isolates. E.H. acknowledges The Grains and Research Development Corporation (project UHS11002). M.C.M., A.M., S.S., and P.S.S. also acknowledge The Grains and Research Development Corporation for the collection of isolates (projects DAN00203 and DAN00177)

    The barley pan-genome reveals the hidden legacy of mutation breeding

    Get PDF
    Genetic diversity is key to crop improvement. Owing to pervasive genomic structural variation, a single reference genome assembly cannot capture the full complement of sequence diversity of a crop species (known as the ‘pan-genome’1). Multiple high-quality sequence assemblies are an indispensable component of a pan-genome infrastructure. Barley (Hordeum vulgare L.) is an important cereal crop with a long history of cultivation that is adapted to a wide range of agro-climatic conditions2. Here we report the construction of chromosome-scale sequence assemblies for the genotypes of 20 varieties of barley—comprising landraces, cultivars and a wild barley—that were selected as representatives of global barley diversity. We catalogued genomic presence/absence variants and explored the use of structural variants for quantitative genetic analysis through whole-genome shotgun sequencing of 300 gene bank accessions. We discovered abundant large inversion polymorphisms and analysed in detail two inversions that are frequently found in current elite barley germplasm; one is probably the product of mutation breeding and the other is tightly linked to a locus that is involved in the expansion of geographical range. This first-generation barley pan-genome makes previously hidden genetic variation accessible to genetic studies and breeding

    Chromosomal-level assembly of the Asian seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species’ native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics.Web of Scienc

    The Cyclically Seasonal Drosophila subobscura Inversion O Originated From Fragile Genomic Sites and Relocated Immunity and Metabolic Genes

    Get PDF
    Chromosome inversions are important contributors to standing genetic variation in Drosophila subobscura. Presently, the species is experiencing a rapid replacement of high-latitude by low-latitude inversions associated with global warming. Yet not all low-latitude inversions are correlated with the ongoing warming trend. This is particularly unexpected in the case of O because it shows a regular seasonal cycle that peaks in summer and rose with a heatwave. The inconsistent behavior of O across components of the ambient temperature suggests that is causally more complex than simply due to temperature alone. In order to understand the dynamics of O, high-quality genomic data are needed to determine both the breakpoints and the genetic content. To fill this gap, here we generated a PacBio long read-based chromosome-scale genome assembly, from a highly homozygous line made isogenic for an O chromosome. Then we isolated the complete continuous sequence of O by conserved synteny analysis with the available reference genome. Main findings include the following: (i) the assembled O inversion stretches 9.936 Mb, containing > 1,000 annotated genes; (ii) O had a complex origin, involving multiple breaks associated with non-B DNA-forming motifs, formation of a microinversion, and ectopic repair in trans with the two homologous chromosomes; (iii) the O breakpoints carry a pre-inversion record of fragility, including a sequence insertion, and transposition with later inverted duplication of an Attacin immunity gene; and (iv) the O inversion relocated the major insulin signaling forkhead box subgroup O (foxo) gene in tight linkage with its antagonistic regulatory partner serine/threonine-protein kinase B (Akt1) and disrupted concerted evolution of the two inverted Attacin duplicates, reattaching them to dFOXO metabolic enhancers. Our findings suggest that O exerts antagonistic pleiotropic effects on reproduction and immunity, setting a framework to understand its relationship with climate change. Furthermore, they are relevant for fragility in genome rearrangement evolution and for current views on the contribution of breakage versus repair in shaping inversion-breakpoint junctions

    Semi-automated assembly of high-quality diploid human reference genomes

    Get PDF
    The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements

    De novo Assembly of a 40 Mb Eukaryotic Genome from Short Sequence Reads: Sordaria macrospora, a Model Organism for Fungal Morphogenesis

    Get PDF
    Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30–90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in ∼4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for comparative studies to address basic questions of fungal biology
    • …
    corecore