8 research outputs found

    Super-Pangenome by Integrating the Wild Side of a Species for Accelerated Crop Improvement

    Get PDF
    The pangenome provides genomic variations in the cultivated gene pool for a given species. However, as the crop’s gene pool comprises many species, especially wild relatives with diverse genetic stock, here we suggest using accessions from all available species of a given genus for the development of a more comprehensive and complete pangenome, which we refer to as a super-pangenome. The super-pangenome provides a complete genomic variation repertoire of a genus and offers unprecedented opportunities for crop improvement. This opinion article focuses on recent developments in crop pangenomics, the need for a super-pangenome that should include wild species, and its application for crop improvement

    Pangenomics in microbial and crop research: Progress, applications, and perspectives

    Get PDF
    Advances in sequencing technologies and bioinformatics tools have fueled a renewed interest in whole genome sequencing efforts in many organisms. The growing availability of multiple genome sequences has advanced our understanding of the within-species diversity, in the form of a pangenome. Pangenomics has opened new avenues for future research such as allowing dissection of complex molecular mechanisms and increased confidence in genome mapping. To comprehensively capture the genetic diversity for improving plant performance, the pangenome concept is further extended from species to genus level by the inclusion of wild species, constituting a super-pangenome. Characterization of pangenome has implications for both basic and applied research. The concept of pangenome has transformed the way biological questions are addressed. From understanding evolution and adaptation to elucidating host–pathogen interactions, finding novel genes or breeding targets to aid crop improvement to design effective vaccines for human prophylaxis, the increasing availability of the pangenome has revolutionized several aspects of biological research. The future availability of high-resolution pangenomes based on reference-level near-complete genome assemblies would greatly improve our ability to address complex biological problems

    KinFin:Software for Taxon-Aware Analysis of Clustered Protein Sequences

    Get PDF
    The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence and absence in different species groups as a guide to biology. Due to the high dimensionality of these data, downstream analysis of protein clusters inferred from large numbers of species, or species with many genes, is nontrivial, and few solutions exist for transparent, reproducible, and customizable analyses. We present KinFin, a streamlined software solution capable of integrating data from common file formats and delivering aggregative annotation of protein clusters. KinFin delivers analyses based on systematic taxonomy of the species analyzed, or on user-defined, groupings of taxa, for example, sets based on attributes such as life history traits, organismal phenotypes, or competing phylogenetic hypotheses. Results are reported through graphical and detailed text output files. We illustrate the utility of the KinFin pipeline by addressing questions regarding the biology of filarial nematodes, which include parasites of veterinary and medical importance. We resolve the phylogenetic relationships between the species and explore functional annotation of proteins in clusters in key lineages and between custom taxon sets, identifying gene families of interest. KinFin can easily be integrated into existing comparative genomic workflows, and promotes transparent and reproducible analysis of clustered protein data

    Genomic interrogation of Candida albicans with relation to reproductive health and fertility

    Get PDF
    Candida albicans is a commensal yeast that can colonize a variety of host-associated niches including the human urogenital tract. It is the most common cause of fungal infections both superficial and systemic. Fungal infections, including vulvovaginal candidiasis, have been heavily implicated as a multifaceted cause in human infertility with host immune effects and microbiome alterations being other influencing factors. Previous work investigated the prevalence and diversity of a number of Candida albicans isolates sourced from individuals with differing fertility statuses using MLST-based methods. This current study aimed to use comparative genomic methods to investigate at whole genome level the previously described isolates in combination with database genomes to identify ifgenes or genetic variants display an association with the ability to colonize certain niches. Pangenomeconstruction and enrichment analysis of database C. albicans assemblies showed an enrichment ofvirulence genes with the core genome. A genome wide association study of the Swansea isolates and a large dataset originating from NCBI’s sequence read archive (SRA) identified 35 variants significantly associated with isolation from the female reproductive tract which. These variants presented enrichment for functions related to antifungal resistance and hyphal growth. Together, these variants may influence the ability for a strain to persist within the female reproductive tract and to be capable of causing recurring vulvovaginal candidiasis thus potentially influencing fertility. These results offer ideal targets for further study from a genomic perspective to explore their ecological presence within the organism’s natural environment and further as targets for phenotypic investigations. The outcomes of which can be used to better our understanding of how C. albicans can influence reproductive health and wellbeing

    Collaborative Cross Graphical Genome

    Get PDF
    Reference genomes are the foundation of most bioinformatic pipelines. They are conventionally represented as a set of single-sequence assembled contigs, referred to as linear genomes. The rapid growth of sequencing technologies has driven the advent of pangenomes that integrate multiple genome assemblies in a single representation. Graphs are commonly used in pangenome models. However, there are challenges for graph-based pangenome representations and operations. This dissertation introduces methods for reference pangenome construction, genomic feature annotation, and tools for analyzing population-scale sequence data based on a graphical pangenome model. We first develop a genome registration tool for constructing a reference pangenome model by merging multiple linear genome assemblies and annotations into a graphical genome. Secondly, we develop a graph-based coordinate framework and discuss the strategies for referring to, annotating, and comparing genomic features in a graphical pangenome model. We demonstrate that the graph coordinate system simplifies assembly and annotation updates, identifying and segmenting updated sequences in a specific genomic region. Thirdly, we develop an alignment-free method to analyze population-scale sequence data based on a pangenome model. We demonstrate the application of our methods by constructing pangenome models for a mouse genetic reference population, Collaborative Cross. The pangenome framework proposed in this dissertation simplified the maintenance and management of massive genomic data and established a novel data structure for analyzing, visualizing, and comparing genomic features in an intra-specific population.Doctor of Philosoph

    On the evolution of effector gene families in potato cyst nematodes

    Get PDF
    Potato cyst nematodes (PCN) are economically relevant plant parasites that infect potato crops. The genomes of three PCN species are available and genome data have been generated for several populations of PCN, to address questions related to the molecular basis of plant parasitism. In this thesis, I employ approaches of comparative genomics to highlight differences and similarities between PCNs and other nematode species. I present two new software solutions to address challenges associated with the field of comparative genomics: BlobTools, a taxonomic interrogation toolkit for quality control of genome assemblies, and KinFin, a solution for the analysis of protein orthology data. I apply both software solutions to genomic datasets of nematodes, platyhelminths, and tardigrades. Based on KinFin analysis of plant parasitic nematodes, I identify protein families in PCNs likely to be involved in host-parasitic interaction, termed effectors, and discuss their functions. I highlight examples of horizontal gene transfer from bacteria to plant parasitic nematodes. Through genomic data of European and South American populations of PCNs, I address variation in populations, infer phylogenetic relationships, and try to estimate the effect of selection on effector genes identified through KinFin. Furthermore, I estimate the rate of variation across the reference genomes of two PCNs

    A tale of two clades: genome evolution of oomycetes and fungi.

    Get PDF
    Some of the most ecologically-significant pathogens of plants, animals and marine life come from two groups of filamentous eukaryotes; the oomycetes and the fungi. Although similar in morphology and ecological niche, the two groups are only very-distantly related in terms of evolutionary history. The oomycetes are underresearched in evolutionary science, despite their historical and contemporary impact on food and environmental security. In contrast, fungi themselves are probably the most densely studied and sequenced group of organisms in evolutionary science outside of bacteria. This thesis is a collection of five published computational studies of the evolutionary biology of oomycetes and fungi. The first study is a systematic investigation of bacterial horizontal gene transfer into plant pathogenic oomycete species, which identifies 5 potential HGT events from prokaryotes into multiple oomycetes. The second study is a reconstruction of the evolutionary history of the oomycetes using wholegenome data from 37 species, which supports the larger groups within the oomycetes class but suggests that some exemplar oomycete genera are paraphyletic. Taking advantage of the abundance of genomics data available for all major fungal phyla, the third study reconstructs the evolutionary history of 84 fungal species using seven different phylogenomic techniques and critically evaluates each technique for accuracy, speed and other criteria. The fourth study looks at the pangenomes of four model fungal species, and compares the evolution of genomic variation, virulence and environmental adaptation within each species. The final study presents a refined iteration of the methodology used in the previous pangenome study as a self-contained software package and demonstrates the software’s capabilities through pangenome analysis and re-analysis of both model and non-model fungal species. Together, these studies cover a breadth of molecular evolution, comparative genomics, phylogenomics and pangenomics research for two similar, but evolutionarily-distinct groups of important microscopic eukaryotes
    corecore