18 research outputs found

    Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations.</p> <p>Results</p> <p>We developed a procedure for ortholog prediction between <it>Oryza sativa </it>and <it>Arabidopsis thaliana</it>. Firstly, we established an efficient method to cluster <it>A. thaliana </it>and <it>O. sativa </it>full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions.</p> <p>Conclusion</p> <p>Our procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods.</p

    Application du systÚme GenFam à la réponse au stress des plantes : intégration de l'identification d'éléments cis spécifiques

    Get PDF
    UMR AGAP - Ă©quipe ID - IntĂ©gration des donnĂ©esGenFam est un systĂšme intĂ©gratif d'analyse de familles de gĂšnes. Ce systĂšme permet (i) de crĂ©er des familles de gĂšnes de gĂ©nomes complets, (ii) d’exĂ©cuter une analyse phylogĂ©nĂ©tique de cette famille Ă  travers le gestionnaire de workflows Galaxy afin de dĂ©finir les relations d'homologie, (iii) d'Ă©tudier des Ă©vĂ©nements Ă©volutifs Ă  partir de blocs de syntĂ©nie prĂ©calculĂ©es avec le workflow SynMap de la plateforme de gĂ©nomique comparative (CoGe) et (iv) d’intĂ©grer ces rĂ©sultats dans l'interface de visualisation synthĂ©tique. La premiĂšre application de GenFam est d’identifier des gĂšnes candidats pour la tolĂ©rance aux stress environnementaux. Il nĂ©cessite de mettre en Ă©vidence la prĂ©sence de sĂ©quences rĂ©gulatrices cis spĂ©cifiques de la rĂ©ponse aux stress (de type ABRE, DRE). Dans ce contexte, nous avons besoin d’intĂ©grer de nouveaux outils afin de dĂ©couvrir et chercher des sites de fixation de facteurs de transcription (Transcription Factor Binding Sites, TFBS) dans les sĂ©quences promotrices des gĂšnes membre de la famille Ă©tudiĂ©e. Ce workflow Galaxy va, d'une part, sĂ©lectionner les rĂ©gions flanquantes en 5' ou en 3' des gĂšnes d'intĂ©rĂȘts selon le choix de l'utilisateur. D'autre part, les rĂ©gions flanquantes sont analysĂ©es afin de dĂ©couvrir et rechercher les motifs de sĂ©quences rĂ©gulatrices cis spĂ©cifiques de la rĂ©ponse aux stress avec des mĂ©thodes complĂ©mentaires comme MEME, STIF, PHYME. Ces rĂ©sultats ainsi que l’annotation fonctionnelle des gĂšnes Ă©tiquetĂ©s comme Ă©tant impliquĂ©s dans la rĂ©ponse au stress seront intĂ©grĂ©s dans l’interface de visualisation. Ce travail doit permettre une rĂ©flexion sur la notion d'orthologie fonctionnelle et effectuer une recherche translationnelle depuis les espĂšces modĂšles jusqu'aux espĂšces d'intĂ©rĂȘt agronomique (i.e identifier des gĂšnes candidats pour la rĂ©ponse au stress du cafĂ©ier Ă  partir d'informations fonctionnelles connues chez Arabidopsis)

    Deciphering the genome structure and paleohistory of _Theobroma cacao_

    Get PDF
    We sequenced and assembled the genome of _Theobroma cacao_, an economically important tropical fruit tree crop that is the source of chocolate. The assembly corresponds to 76% of the estimated genome size and contains almost all previously described genes, with 82% of them anchored on the 10 _T. cacao_ chromosomes. Analysis of this sequence information highlighted specific expansion of some gene families during evolution, for example flavonoid-related genes. It also provides a major source of candidate genes for _T. cacao_ disease resistance and quality improvement. Based on the inferred paleohistory of the T. cacao genome, we propose an evolutionary scenario whereby the ten _T. cacao_ chromosomes were shaped from an ancestor through eleven chromosome fusions. The _T. cacao_ genome can be considered as a simple living relic of higher plant evolution

    Genome assembly of<i>Musa beccarii</i>shows extensive chromosomal rearrangements and genome expansion during evolution of Musaceae genomes

    No full text
    Background: Musa beccarii (Musaceae) is a banana species native to Borneo, sometimes grown as an ornamental plant. The basic chromosome number of Musa species is x = 7, 10, or 11; however, M. beccarii has a basic chromosome number of x = 9 (2n = 2x = 18), which is the same basic chromosome number of species in the sister genera Ensete and Musella. Musa beccarii is in the section Callimusa, which is sister to the section Musa. We generated a high-quality chromosome-scale genome assembly of M. beccarii to better understand the evolution and diversity of genomes within the family Musaceae. Findings: The M. beccarii genome was assembled by long-read and Hi-C sequencing, and genes were annotated using both long Isoseq and short RNA-seq reads. The size of M. beccarii was the largest among all known Musaceae assemblies (∌570 Mbp) due to the expansion of transposable elements and increased 45S ribosomal DNA sites. By synteny analysis, we detected extensive genome-wide chromosome fusions and fissions between M. beccarii and the other Musa and Ensete species, far beyond those expected from differences in chromosome number. Within Musaceae, M. beccarii showed a reduced number of terpenoid synthase genes, which are related to chemical defense, and enrichment in lipid metabolism genes linked to the physical defense of the cell wall. Furthermore, type III polyketide synthase was the most abundant biosynthetic gene cluster (BGC) in M. beccarii. BGCs were not conserved in Musaceae genomes. Conclusions: The genome assembly of M. beccarii is the first chromosome-scale genome assembly in the Callimusa section in Musa, which provides an important genetic resource that aids our understanding of the evolution of Musaceae genomes and enhances our knowledge of the pangenome

    RedOak: a reference-free and alignment-free structure for indexing a collection of similar genomes

    No full text
    Background: As the cost of DNA sequencing decreases, high-throughput sequencing technologies become increasingly accessible to many laboratories. Consequently, new issues emerge that require new algorithms, including tools for indexing and compressing hundred to thousands of complete genomes.Results: This paper presents RedOak, a reference-free and alignment-free software package that allows for the indexing of a large collection of similar genomes. RedOak can also be applied to reads from unassembled genomes, and it provides a nucleotide sequence query function. This software is based on a k-mer approach and has been developed to be heavily parallelized and distributed on several nodes of a cluster. The source code of our RedOak algorithm is available at https://gitlab.info-ufr.univ-montp2.fr/DoccY/RedOak.Conclusions: RedOak may be really useful for biologists and bioinformaticians expecting to extract information from large sequence datasets

    The cacao Criollo genome v2.0: an improved version of the genome for genetic and functional genomic studies

    No full text
    International audienceBackground: Theobroma cacao L., native to the Amazonian basin of South America, is an economically important fruit tree crop for tropical countries as a source of chocolate. The first draft genome of the species, from a Criollo cultivar, was published in 2011. Although a useful resource, some improvements are possible, including identifying misassemblies, reducing the number of scaffolds and gaps, and anchoring un-anchored sequences to the 10 chromosomes. Methods: We used a NGS-based approach to significantly improve the assembly of the Belizian Criollo B97-61/B2 genome. We combined four Illumina large insert size mate paired libraries with 52x of Pacific Biosciences long reads to correct misassembled regions and reduced the number of scaffolds. We then used genotyping by sequencing (GBS) methods to increase the proportion of the assembly anchored to chromosomes

    AgroLD: A Knowledge Graph Database for plant functional genomics

    No full text
    International audienceThe Explore Relationships tool aids in exploring relationships between existing entities. Quick search is based on keyword search and aids in understanding the underlying knowledge

    A chromosome-level reference genome of Ensete glaucum gives insight into diversity and chromosomal and repetitive sequence evolution in the Musaceae

    Get PDF
    International audienceBackground Ensete glaucum (2n = 2x = 18) is a giant herbaceous monocotyledonous plant in the small Musaceae family along with banana (Musa). A high-quality reference genome sequence assembly of E. glaucum is a resource for functional and evolutionary studies of Ensete, Musaceae, and the Zingiberales. Findings Using Oxford Nanopore Technologies, chromosome conformation capture (Hi-C), Illumina and RNA survey sequence, supported by molecular cytogenetics, we report a high-quality 481.5 Mb genome assembly with 9 pseudo-chromosomes and 36,836 genes. A total of 55% of the genome is composed of repetitive sequences with predominantly LTR-retroelements (37%) and DNA transposons (7%). The single 5S ribosomal DNA locus had an exceptionally long monomer length of 1,056 bp, more than twice that of the monomers at multiple loci in Musa. A tandemly repeated satellite (1.1% of the genome, with no similar sequence in Musa) was present around all centromeres, together with a few copies of a long interspersed nuclear element (LINE) retroelement. The assembly enabled us to characterize in detail the chromosomal rearrangements occurring between E. glaucum and the x = 11 species of Musa. One E. glaucum chromosome has the same gene content as Musa acuminata, while others show multiple, complex, but clearly defined evolutionary rearrangements in the change between x= 9 and 11. Conclusions The advance towards a Musaceae pangenome including E. glaucum, tolerant of extreme environments, makes a complete set of gene alleles, copy number variation, and a reference for structural variation available for crop breeding and understanding environmental responses. The chromosome-scale genome assembly shows the nature of chromosomal fusion and translocation events during speciation, and features of rapid repetitive DNA change in terms of copy number, sequence, and genomic location, critical to understanding its role in diversity and evolution
    corecore