17 research outputs found

    Widespread gene duplication and adaptive evolution in the RNA interference pathways of the <i>Drosophila obscura</i> group

    Get PDF
    Background RNA interference (RNAi) related pathways provide defense against viruses and transposable elements, and have been implicated in the suppression of meiotic drive elements. Genes in these pathways often exhibit high levels of adaptive substitution, and over longer timescales show gene duplication and loss—most likely as a consequence of their role in mediating conflict with these parasites. This is particularly striking for Argonaute 2 (Ago2), which is ancestrally the key effector of antiviral RNAi in insects, but has repeatedly formed new testis-specific duplicates in the recent history of the obscura species-group of Drosophila. Results Here we take advantage of publicly available genomic and transcriptomic data to identify six further RNAi-pathway genes that have duplicated in this clade of Drosophila, and examine their evolutionary history. As seen for Ago2, we observe high levels of adaptive amino-acid substitution and changes in sex-biased expression in many of the paralogs. However, our phylogenetic analysis suggests that co-duplications of the RNAi machinery were not synchronous, and our expression analysis fails to identify consistent male-specific expression. Conclusions These results confirm that RNAi genes, including genes of the antiviral and piRNA pathways, have undergone multiple independent duplications and that their history has been particularly labile within the obscura group. However, they also suggest that the selective pressures driving these changes have not been consistent, implying that more than one selective agent may be responsible.ISSN:1471-214

    Structural variant-based pangenome construction has low sensitivity to variability of haplotype-resolved bovine assemblies

    Full text link
    Advantages of pangenomes over linear reference assemblies for genome research have recently been established. However, potential effects of sequence platform and assembly approach, or of combining assemblies created by different approaches, on pangenome construction have not been investigated. Here we generate haplotype-resolved assemblies from the offspring of three bovine trios representing increasing levels of heterozygosity that each demonstrate a substantial improvement in contiguity, completeness, and accuracy over the current Bos taurus reference genome. Diploid coverage as low as 20x for HiFi or 60x for ONT is sufficient to produce two haplotype-resolved assemblies meeting standards set by the Vertebrate Genomes Project. Structural variant-based pangenomes created from the haplotype-resolved assemblies demonstrate significant consensus regardless of sequence platform, assembler algorithm, or coverage. Inspecting pangenome topologies identifies 90 thousand structural variants including 931 overlapping with coding sequences; this approach reveals variants affecting QRICH2, PRDM9, HSPA1A, TAS2R46, and GC that have potential to affect phenotype

    Structural variant-based pangenome construction has low sensitivity to variability of haplotype-resolved bovine assemblies

    Get PDF
    Advantages of pangenomes over linear reference assemblies for genome research have recently been established. However, potential effects of sequence platform and assembly approach, or of combining assemblies created by different approaches, on pangenome construction have not been investigated. Here we generate haplotype-resolved assemblies from the offspring of three bovine trios representing increasing levels of heterozygosity that each demonstrate a substantial improvement in contiguity, completeness, and accuracy over the current Bos taurus reference genome. Diploid coverage as low as 20x for HiFi or 60x for ONT is sufficient to produce two haplotype-resolved assemblies meeting standards set by the Vertebrate Genomes Project. Structural variant-based pangenomes created from the haplotype-resolved assemblies demonstrate significant consensus regardless of sequence platform, assembler algorithm, or coverage. Inspecting pangenome topologies identifies 90 thousand structural variants including 931 overlapping with coding sequences; this approach reveals variants affecting QRICH2, PRDM9, HSPA1A, TAS2R46, and GC that have potential to affect phenotype

    Structural variation and introgression from wild populations in East Asian cattle genomes confer adaptation to local environment

    Get PDF
    BACKGROUND: Structural variations (SVs) in individual genomes are major determinants of complex traits, including adaptability to environmental variables. The Mongolian and Hainan cattle breeds in East Asia are of taurine and indicine origins that have evolved to adapt to cold and hot environments, respectively. However, few studies have investigated SVs in East Asian cattle genomes and their roles in environmental adaptation, and little is known about adaptively introgressed SVs in East Asian cattle. RESULTS: In this study, we examine the roles of SVs in the climate adaptation of these two cattle lineages by generating highly contiguous chromosome-scale genome assemblies. Comparison of the two assemblies along with 18 Mongolian and Hainan cattle genomes obtained by long-read sequencing data provides a catalog of 123,898 nonredundant SVs. Several SVs detected from long reads are in exons of genes associated with epidermal differentiation, skin barrier, and bovine tuberculosis resistance. Functional investigations show that a 108-bp exonic insertion in SPN may affect the uptake of Mycobacterium tuberculosis by macrophages, which might contribute to the low susceptibility of Hainan cattle to bovine tuberculosis. Genotyping of 373 whole genomes from 39 breeds identifies 2610 SVs that are differentiated along a "north-south" gradient in China and overlap with 862 related genes that are enriched in pathways related to environmental adaptation. We identify 1457 Chinese indicine-stratified SVs that possibly originate from banteng and are frequent in Chinese indicine cattle. CONCLUSIONS: Our findings highlight the unique contribution of SVs in East Asian cattle to environmental adaptation and disease resistance

    Establishing Bovine Pangenome Graphs

    No full text
    The assembly of the draft Bos taurus reference genome was a milestone for genetics- and genomics-oriented research in cattle. The reference genome of domestic cattle was built from a single animal from the Hereford breed. However, the linear reference sequence does not represent the genetic diversity of global cattle breeds. The lack of diversity causes problems, particularly when DNA sequences from genetically distant animals are aligned and compared to the reference sequence. This issue is widely known as reference bias. Pangenomes are an intriguing novel reference structure to consider the full-spectrum of genetic diversity within a species. A rich, graph-based pangenome reference can integrate multiple genome assemblies and their sites of variations in a coherent and non-redundant data structure. This thesis investigated for the first time the utility of graph-based references for genomic analysis in a livestock population. Chapter 2 assessed the feasibility of graph-based genomic analysis in cattle. Specifically, a graph-based sequence variant genotyping approach was implemented using the Graphtyper software and compared to two widely-used methods (SAMtools and GATK) that rely on a strictly linear representation of the reference using whole-genome sequencing data of 49 Original Braunvieh cattle. A comparison between sequence variant and array-derived genotypes indicated that the graph-based approach outperformed both SAMtools and GATK with regard to genotype concordance, non-reference sensitivity, non-reference discrepancy, and Mendelian consistency of genotypes observed in parent-offspring pairs. These findings demonstrated that graph-based genotyping using Graphtyper is accurate, sensitive, and computationally feasible in the cattle genome. Chapter 3 reports on the construction of breed-specific and multi-breed genome graphs for four European cattle breeds (Original Braunvieh, Brown Swiss, Fleckvieh, and Holstein). The vg toolkit was used to augment the linear Hereford-based reference sequence with variants that were prioritized based on allele frequency in different breeds. Based on both real and simulated short-read sequencing data, this study showed that variant prioritization is crucial to build informative genome graphs. Intriguingly, adding many low frequency and rare variants to the genome graphs compromised mapping accuracy. Moreover, this chapter demonstrated that multi-breed graphs and breed-specific graphs enable almost identical mapping improvements over a linear reference genome. Finally, the first whole-genome graph was constructed for the Brown Swiss cattle breed using 14 million variants. The application of this whole-genome graph facilitated accurate short-read mapping and unbiased sequence variant discovery. Chapter 4 reports on integrating six reference-quality bovine genome assemblies into a unified multi-assembly graph using the minigraph software. The pangenome contains 70 megabases that are not present in the current ARS-UCD1.2 Bos taurus reference genome. Using complementary bioinformatics approaches, this chapter provides compelling evidence that these non-reference sequences contain functionally active and biologically-relevant elements. Specifically, the analysis of transcriptome data revealed putatively novel genes, including some that are differentially expressed between individual animals. Moreover, variant discovery in the non-reference sequences revealed thousands of yet undetected polymorphic sites capturing genetic differentiation across cattle breeds. This chapter demonstrated that multi-assembly graphs make so far neglected genetic variations amenable to genetic investigations. Overall, this thesis presents a novel analysis paradigm in livestock genomics by leveraging variation-aware reference structures. The analyses presented in this thesis provide a first step towards the transition from linear to graph-based reference structures in order to mitigate inherent biases of the linear reference genome. Importantly, this thesis establishes a computational framework to integrate multiple genome assemblies and their sites of variations into a more diverse reference structure broadly applicable across species

    Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery

    No full text
    Background The current bovine genomic reference sequence was assembled from a Hereford cow. The resulting linear assembly lacks diversity because it does not contain allelic variation, a drawback of linear references that causes reference allele bias. High nucleotide diversity and the separation of individuals by hundreds of breeds make cattle ideally suited to investigate the optimal composition of variation-aware references. Results We augment the bovine linear reference sequence (ARS-UCD1.2) with variants filtered for allele frequency in dairy (Brown Swiss, Holstein) and dual-purpose (Fleckvieh, Original Braunvieh) cattle breeds to construct either breed-specific or pan-genome reference graphs using the vg toolkit. We find that read mapping is more accurate to variation-aware than linear references if pre-selected variants are used to construct the genome graphs. Graphs that contain random variants do not improve read mapping over the linear reference sequence. Breed-specific augmented and pan-genome graphs enable almost similar mapping accuracy improvements over the linear reference. We construct a whole-genome graph that contains the Hereford-based reference sequence and 14 million alleles that have alternate allele frequency greater than 0.03 in the Brown Swiss cattle breed. Our novel variation-aware reference facilitates accurate read mapping and unbiased sequence variant genotyping for SNPs and Indels. Conclusions We develop the first variation-aware reference graph for an agricultural animal (https://doi.org/10.5281/zenodo.3759712). Our novel reference structure improves sequence read mapping and variant genotyping over the linear reference. Our work is a first step towards the transition from linear to variation-aware reference structures in species with high genetic diversity and many sub-populations

    Accurate sequence variant genotyping in cattle using variation-aware genome graphs

    No full text
    International audienceAbstractBackgroundGenotyping of sequence variants typically involves, as a first step, the alignment of sequencing reads to a linear reference genome. Because a linear reference genome represents only a small fraction of all the DNA sequence variation within a species, reference allele bias may occur at highly polymorphic or divergent regions of the genome. Graph-based methods facilitate the comparison of sequencing reads to a variation-aware genome graph, which incorporates a collection of non-redundant DNA sequences that segregate within a species. We compared the accuracy and sensitivity of graph-based sequence variant genotyping using the Graphtyper software to two widely-used methods, i.e., GATK and SAMtools, which rely on linear reference genomes using whole-genome sequencing data from 49 Original Braunvieh cattle.ResultsWe discovered 21,140,196, 20,262,913, and 20,668,459 polymorphic sites using GATK, Graphtyper, and SAMtools, respectively. Comparisons between sequence variant genotypes and microarray-derived genotypes showed that Graphtyper outperformed both GATK and SAMtools in terms of genotype concordance, non-reference sensitivity, and non-reference discrepancy. The sequence variant genotypes that were obtained using Graphtyper had the smallest number of Mendelian inconsistencies between sequence-derived single nucleotide polymorphisms and indels in nine sire-son pairs. Genotype phasing and imputation using the Beagle software improved the quality of the sequence variant genotypes for all the tools evaluated, particularly for animals that were sequenced at low coverage. Following imputation, the concordance between sequence- and microarray-derived genotypes was almost identical for the three methods evaluated, i.e., 99.32, 99.46, and 99.24% for GATK, Graphtyper, and SAMtools, respectively. Variant filtration based on commonly used criteria improved genotype concordance slightly but it also decreased sensitivity. Graphtyper required considerably more computing resources than SAMtools but less than GATK.ConclusionsSequence variant genotyping using Graphtyper is accurate, sensitive and computationally feasible in cattle. Graph-based methods enable sequence variant genotyping from variation-aware reference genomes that may incorporate cohort-specific sequence variants, which is not possible with the current implementation of state-of-the-art methods that rely on linear reference genomes

    Comparison of methods for building pangenome graphs

    No full text
    Graph-based pangenomes could mitigate the bias from the use of a single reference genome. However, the impact of different pangenome integration approaches on the properties of the resulting graph remains unknown. In this study, we compared three methods for building pangenome graphs from multiple genome assemblies. Minigraph performs an approximate mapping of assemblies to a backbone genome to construct the pangenome. In contrast, Cactus and pggb apply reference-free base level alignment to build pangenome graphs. Our results show that pangenome graphs constructed via base-level alignment contain 40% more small variations than the minigraph pangenome. Cactus, pggb, and minigraph uncover an almost identical set of structural variations. However, the breakpoints and allelic paths inferred from minigraph’s approximate mapping tend to be less precise than graphs constructed via base- level alignment. Taken together, our study informs on the optimal strategy for building informative pangenome graphs that are now being conducted for many species

    Assessing genomic diversity and signatures of selection in Original Braunvieh cattle using whole-genome sequencing data

    No full text
    Background Autochthonous cattle breeds are an important source of genetic variation because they might carry alleles that enable them to adapt to local environment and food conditions. Original Braunvieh (OB) is a local cattle breed of Switzerland used for beef and milk production in alpine areas. Using whole-genome sequencing (WGS) data of 49 key ancestors, we characterize genomic diversity, genomic inbreeding, and signatures of selection in Swiss OB cattle at nucleotide resolution. Results We annotated 15,722,811 SNPs and 1,580,878 Indels including 10,738 and 2763 missense deleterious and high impact variants, respectively, that were discovered in 49 OB key ancestors. Six Mendelian trait-associated variants that were previously detected in breeds other than OB, segregated in the sequenced key ancestors including variants causal for recessive xanthinuria and albinism. The average nucleotide diversity (1.6 × 10− 3) was higher in OB than many mainstream European cattle breeds. Accordingly, the average genomic inbreeding derived from runs of homozygosity (ROH) was relatively low (FROH = 0.14) in the 49 OB key ancestor animals. However, genomic inbreeding was higher in OB cattle of more recent generations (FROH = 0.16) due to a higher number of long (> 1 Mb) runs of homozygosity. Using two complementary approaches, composite likelihood ratio test and integrated haplotype score, we identified 95 and 162 genomic regions encompassing 136 and 157 protein-coding genes, respectively, that showed evidence (P < 0.005) of past and ongoing selection. These selection signals were enriched for quantitative trait loci related to beef traits including meat quality, feed efficiency and body weight and pathways related to blood coagulation, nervous and sensory stimulus. Conclusions We provide a comprehensive overview of sequence variation in Swiss OB cattle genomes. With WGS data, we observe higher genomic diversity and less inbreeding in OB than many European mainstream cattle breeds. Footprints of selection were detected in genomic regions that are possibly relevant for meat quality and adaptation to local environmental conditions. Considering that the population size is low and genomic inbreeding increased in the past generations, the implementation of optimal mating strategies seems warranted to maintain genetic diversity in the Swiss OB cattle population

    Novel functional sequences uncovered through a bovine multiassembly graph

    No full text
    Many genomic analyses start by aligning sequencing reads to a linear reference genome. However, linear reference genomes are imperfect, lacking millions of bases of unknown relevance and are unable to reflect the genetic diversity of populations. This makes reference-guided methods susceptible to reference-allele bias. To overcome such limitations, we build a pangenome from six reference-quality assemblies from taurine and indicine cattle as well as yak. The pangenome contains an additional 70,329,827 bases compared to the Bos taurus reference genome. Our multiassembly approach reveals 30 and 10.1 million bases private to yak and indicine cattle, respectively, and between 3.3 and 4.4 million bases unique to each taurine assembly. Utilizing transcriptomes from 56 cattle, we show that these nonreference sequences encode transcripts that hitherto remained undetected from the B. taurus reference genome. We uncover genes, primarily encoding proteins contributing to immune response and pathogen-mediated immunomodulation, differentially expressed between Mycobacterium bovis-infected and noninfected cattle that are also undetectable in the B. taurus reference genome. Using whole-genome sequencing data of cattle from five breeds, we show that reads which were previously misaligned against the Bos taurus reference genome now align accurately to the pangenome sequences. This enables us to discover 83,250 polymorphic sites that segregate within and between breeds of cattle and capture genetic differentiation across breeds. Our work makes a so-far unused source of variation amenable to genetic investigations and provides methods and a framework for establishing and exploiting a more diverse reference genome.ISSN:0027-8424ISSN:1091-649
    corecore