4 research outputs found

    Haplogenome assembly reveals structural variation in Eucalyptus interspecific hybrids

    Get PDF
    DATA AVAILABILITY : Illumina DNA sequencing data was uploaded at NCBI SRA under BioProject: PRJNA885070. High density genetic linkage maps are available on GitLab [78]. The haplogenome assemblies were uploaded to the NCBI database and can be accessed with accession no. JAOPUP000000000 and JAOPUO000000000. All supporting data such as repeat element libraries, genome annotation files, synteny analyses output files etc. are available in GigaDB [79].ADDITIONAL FILES : SUPPLEMENTARY FIGURE S1: Genome size estimates for the (A) E. urophylla, (B) E. grandis and (C) the E. urophylla x E. grandis F1 hybrid genomes. SUPPLEMENTARY FIGURE S2: Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness scores for both haplogenome assemblies as well as the currently available E. grandis v2.0 reference genome. SUPPLEMENTARY FIGURE S3: Alignment of placed haplogenome scaffolds to the E. grandis v2.0 reference genome. SUPPLEMENTARY FIGURE S4: Alignment between the E. grandis and E. urophylla scaffolded haplogenome assemblies. SUPPLEMENTARY FIGURE S5: Pseudochromosomes of E. urophylla haplogenome, reconstructed from two genetic linkage input maps – uro.allmap and gra.allmap, with unequal weights (2 and 1 respectively). SUPPLEMENTARY FIGURE S6: Pseudochromosomes of E. grandis haplogenome, reconstructed from two genetic linkage input maps – gra.allmap and uro.allmap, with unequal weights (2 and 1 respectively). SUPPLEMENTARY FIGURE S7: Scaffolded chromosome sizes of the E. grandis v2.0 and the scaffolded E. grandis and E. urophylla haplogenome assemblies. SUPPLEMENTARY FIGURE S8: Alignment of unplaced E. grandis and E. urophylla haplogenome scaffolds to the E. grandis v2.0 reference genome. SUPPLEMENTARY FIGURE S9: Syntenic and rearranged regions between the E. grandis v2.0, E. grandis and E. urophylla haplogenomes for all eleven chromosomes. SUPPLEMENTARY FIGURE S10: Enriched gene ontology (GO) terms for inverted and translocated gene alignment blocks of the E. grandis haplogenome. SUPPLEMENTARY FIGURE S11: Enriched gene ontology (GO) terms for inverted and translocated gene alignment blocks of the E. urophylla haplogenome. SUPPLEMENTARY FIGURE S12: Enriched gene ontology (GO) terms genes that did not have pairwise alignment between the E. grandis and E. urophylla haplogenomes. SUPPLEMENTARY FIGURE S13: Hap-mer blob plot of the E. grandis and E. urophylla haplogenome assemblies. SUPPLEMENTARY FIGURE S14: Evaluation of haplotype phase blocks. All hap-mer information was generated with Merqury v1.1 [72]. SUPPLEMENTARY FIGURE S15: Genome coverage of the E. grandis v2.0 nuclear reference and plastid genomes. SUPPLEMENTARY FIGURE S16: Summary of the total size and type of elements found in high genome coverage bins. Organellar introgression was identified through BLAST analysis to the E. grandis plastid genomes [77], while repeat elements were identified with RepeatMasker. SUPPLEMENTARY NOTE 1: Hapmer based phasing completeness assessment. SUPPLEMENTARY NOTE 2: Read and assembly alignment and validation of high peak content. SUPPLEMENTARY TABLE S1: Illumina sequencing results. SUPPLEMENTARY TABLE S2: Nanopore sequencing results for the F1 hybrid individual. SUPPLEMENTARY TABLE S3: Summary statistics for long-read binning using the parental short reads. SUPPLEMENTARY TABLE S4: Summary statistics of placed and unplaced contigs after scaffolding with ALLMAPS for the E. urophylla and E. grandis haplogenomes respectively. SUPPLEMENTARY TABLE S5. Repeat element content of assembled haplogenomes. SUPPLEMENTARY TABLE S6: Haplogenome annotation statistics. SUPPLEMENTARY TABLE S7: Number and total length of syntenic and rearranged regions in the E. grandis and E. urophylla haplogenomes. SUPPLEMENTARY TABLE S8: Number and total length of local sequence variation in syntenic and rearranged region in the E. grandis and E. urophylla haplogenomes. SUPPLEMENTARY TABLE S9: Inversions larger than 50 kb between the E. grandis and E. urophylla haplogenomes. SUPPLEMENTARY TABLE S10: Translocations between the E. grandis and E. urophylla haplogenomes that are larger than 50 kb. SUPPLEMENTARY TABLE S11: KEGG pathway enrichment analyses for genes within inverted and translocated gene alignment blocks between the E. grandis and E. urophylla haplogenome assemblies. SUPPLEMENTARY TABLE S12: KEGG pathway enrichment analyses for genes that do not have a pairwise alignment between the E. grandis (reference) and E. urophylla (test) haplogenome assemblies. SUPPLEMENTARY TABLE S13: Altered position and length of genes with an in-frame stop codon. SUPPLEMENTARY TABLE S14: Phase block statistics of the E. grandis and E. urophylla haplo-genome assemblies. SUPPLEMENTARY TABLE S15: E. grandis and E. urophylla high coverage bin content.BACKGROUND De novo phased (haplo)genome assembly using long-read DNA sequencing data has improved the detection and characterization of structural variants (SVs) in plant and animal genomes. Able to span across haplotypes, long reads allow phased, haplogenome assembly in highly outbred organisms such as forest trees. Eucalyptus tree species and interspecific hybrids are the most widely planted hardwood trees with F1 hybrids of Eucalyptus grandis and E. urophylla forming the bulk of fast-growing pulpwood plantations in subtropical regions. The extent of structural variation and its effect on interspecific hybridization is unknown in these trees. As a first step towards elucidating the extent of structural variation between the genomes of E. grandis and E. urophylla, we sequenced and assembled the haplogenomes contained in an F1 hybrid of the two species. FINDINGS Using Nanopore sequencing and a trio-binning approach, we assembled the separate haplogenomes (566.7 Mb and 544.5 Mb) to 98.0% BUSCO completion. High-density SNP genetic linkage maps of both parents allowed scaffolding of 88.0% of the haplogenome contigs into 11 pseudo-chromosomes (scaffold N50 of 43.8 Mb and 42.5 Mb for the E. grandis and E. urophylla haplogenomes, respectively). We identify 48,729 SVs between the two haplogenomes providing the first detailed insight into genome structural rearrangement in these species. The two haplogenomes have similar gene content, 35,572 and 33,915 functionally annotated genes, of which 34.7% are contained in genome rearrangements. CONCLUSIONS Knowledge of SV and haplotype diversity in the two species will form the basis for understanding the genetic basis of hybrid superiority in these trees.The Department of Science and Innovation (DSI) and Technology Innovation Agency (TIA) of South Africa, Sappi Southern Africa through the Forest Molecular Genetics (FMG) Industry Consortium at the University of Pretoria (UP), the National Research Foundation (NRF) of South Africa and funding from the UP Postgraduate Studies Abroad Programme.https://academic.oup.com/gigascienceam2024BiochemistryForestry and Agricultural Biotechnology Institute (FABI)GeneticsMicrobiology and Plant PathologySDG-15:Life on lan

    The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms

    No full text
    In October 2021, 59 scientists from 14 countries and 13 U.S. states collaborated virtually in the Third Annual Baylor College of Medicine &amp; DNANexus Structural Variation hackathon. The goal of the hackathon was to advance research on structural variants (SVs) by prototyping and iterating on open-source software. This led to nine hackathon projects focused on diverse genomics research interests, including various SV discovery and genotyping methods, SV sequence reconstruction, and clinically relevant structural variation, including SARS-CoV-2 variants. Repositories for the projects that participated in the hackathon are available at https://github.com/collaborativebioinformatics.</p
    corecore