Search CORE

34 research outputs found

Rapid protein evolution, organellar reductions, and invasive intronic elements in the marine aerobic parasite dinoflagellate Amoebophrya spp

Author: Alberti Adriana
Alves-de-Souza Catharina
Aury Jean-Marc
Barbeyron Tristan
Bigeard Estelle
Cai Ruibo
Corre Erwan
Da Silva Corinne
Farhat Sarah
Florent Isabelle
Guillou Laure
Istace Benjamin
Kayal Ehsan
Labadie Karine
Le Phuong
Marie Dominique
Maumus Florian
Mercier Jonathan
Noel Benjamin
Porcel Betina M.
Rombauts Stephane
Rouzé Pierre
Rukwavu Tsinda
Szymczak Jeremy
Tonon Thierry
Van de Peer Yves
Wincker Patrick
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Background: Dinoflagellates are aquatic protists particularly widespread in the oceans worldwide. Some are responsible for toxic blooms while others live in symbiotic relationships, either as mutualistic symbionts in corals or as parasites infecting other protists and animals. Dinoflagellates harbor atypically large genomes (similar to 3 to 250 Gb), with gene organization and gene expression patterns very different from closely related apicomplexan parasites. Here we sequenced and analyzed the genomes of two early-diverging and co-occurring parasitic dinoflagellate Amoebophrya strains, to shed light on the emergence of such atypical genomic features, dinoflagellate evolution, and host specialization. Results: We sequenced, assembled, and annotated high-quality genomes for two Amoebophrya strains (A25 and A120), using a combination of Illumina paired-end short-read and Oxford Nanopore Technology (ONT) MinION long-read sequencing approaches. We found a small number of transposable elements, along with short introns and intergenic regions, and a limited number of gene families, together contribute to the compactness of the Amoebophrya genomes, a feature potentially linked with parasitism. While the majority of Amoebophrya proteins (63.7% of A25 and 59.3% of A120) had no functional assignment, we found many orthologs shared with Dinophyceae. Our analyses revealed a strong tendency for genes encoded by unidirectional clusters and high levels of synteny conservation between the two genomes despite low interspecific protein sequence similarity, suggesting rapid protein evolution. Most strikingly, we identified a large portion of non-canonical introns, including repeated introns, displaying a broad variability of associated splicing motifs never observed among eukaryotes. Those introner elements appear to have the capacity to spread over their respective genomes in a manner similar to transposable elements. Finally, we confirmed the reduction of organelles observed in Amoebophrya spp., i.e., loss of the plastid, potential loss of a mitochondrial genome and functions. Conclusion: These results expand the range of atypical genome features found in basal dinoflagellates and raise questions regarding speciation and the evolutionary mechanisms at play while parastitism was selected for in this particular unicellular lineage

HAL Evry

Ghent University Academic Bibliography

HAL Descartes

HAL-INSU

HAL-CEA

White Rose Research Online

Hal-Diderot

UPSpace at the University of Pretoria

Rapid protein evolution, organellar reductions, and invasive intronic elements in the marine aerobic parasite dinoflagellate Amoebophrya spp

Author: Alberti Adriana
Alves-de-Souza Catharina
Aury Jean-Marc
Barbeyron Tristan
Bigeard Estelle
Cai Ruibo
Corre Erwan
Da Silva Corinne
Farhat Sarah
Florent Isabelle
Guillou Laure
Istace Benjamin
Kayal Ehsan
Labadie Karine
Le Phuong
Marie Dominique
Maumus Florian
Mercier Jonathan
Noel Benjamin
Porcel Betina M.
Rombauts Stephane
Rouze Pierre
Rukwavu Tsinda
Szymczak Jeremy
Tonon Thierry
Van de Peer Yves
Wincker Patrick
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/01/2021
Field of study

BACKGROUND : Dinoflagellates are aquatic protists particularly widespread in the oceans worldwide. Some are responsible for toxic blooms while others live in symbiotic relationships, either as mutualistic symbionts in corals or as parasites infecting other protists and animals. Dinoflagellates harbor atypically large genomes (~ 3 to 250 Gb), with gene organization and gene expression patterns very different from closely related apicomplexan parasites. Here we sequenced and analyzed the genomes of two early-diverging and co-occurring parasitic dinoflagellate Amoebophrya strains, to shed light on the emergence of such atypical genomic features, dinoflagellate evolution, and host specialization. RESULTS : We sequenced, assembled, and annotated high-quality genomes for two Amoebophrya strains (A25 and A120), using a combination of Illumina paired-end short-read and Oxford Nanopore Technology (ONT) MinION long-read sequencing approaches. We found a small number of transposable elements, along with short introns and intergenic regions, and a limited number of gene families, together contribute to the compactness of the Amoebophrya genomes, a feature potentially linked with parasitism. While the majority of Amoebophrya proteins (63.7% of A25 and 59.3% of A120) had no functional assignment, we found many orthologs shared with Dinophyceae. Our analyses revealed a strong tendency for genes encoded by unidirectional clusters and high levels of synteny conservation between the two genomes despite low interspecific protein sequence similarity, suggesting rapid protein evolution. Most strikingly, we identified a large portion of non-canonical introns, including repeated introns, displaying a broad variability of associated splicing motifs never observed among eukaryotes. Those introner elements appear to have the capacity to spread over their respective genomes in a manner similar to transposable elements. Finally, we confirmed the reduction of organelles observed in Amoebophrya spp., i.e., loss of the plastid, potential loss of a mitochondrial genome and functions. CONCLUSION : These results expand the range of atypical genome features found in basal dinoflagellates and raise questions regarding speciation and the evolutionary mechanisms at play while parastitism was selected for in this particular unicellular lineage.ADDITIONAL FILE 1: FIGURE S1. Phylogeny of Alveolata. Proteomes from 89 alveolates genomes and transcriptome assemblies from the MMETSP project (https://zenodo.org/record/257026/files/) were used to create orthologous groups using orthofinder v2.2 with the diamond BLAST similarity search. Single ortholog alignments were pruned using PhyloTreePruner v.1.0 (minimum taxa to keep 44 and support value 0.9) and realigned using mafft v7 and filtered with Gblocks v.0.91b (−b5 = a -p = n). Filtered alignments were concatenated using seqCat.pl and a phylogenetic tree was produced under Maximum Likelihood framework using RAxML v8.2.9 with the PROTGAMMALGF model of sequence evolution and 101 bootstraps. Asterics represent support values of 95 and above. A detailed method can be found in Kayal et al. 2018 BMC Evol. Biol. (https://doi.org/10.1186/s12862-018-1142-0). The full tree can be found at http://mmo.sb-roscoff.fr/jbrowseAmoebophrya/. FIGURE S2. SSU rDNA sequence identity (in percentage, relative to A25 and A120 compared to other species). FIGURE S3. Distribution of k-mer in A25 and A120 genomes. FIGURE S4. Classification of repeated elements in 3 Amoebophrya genomes (AT5, A25, and A120) using REPET. The x-axis represents the cumulated number of bases of repeated elements in the genome. FIGURE S5. Conserved motif of the putative splice leader (SL) in A25 and A120. FIGURE S6. Alignments of gene encoding the putative spliced leader (SL) gene in A25 and A120. FIGURE S7. Gene orientation change rate in 3 Amoebophrya genomes. FIGURE S8. Number of orthologs genes shared by selected taxa. FIGURE S9. Boxplot of the dN/dS ratios of orthologous genes between A25 and A120, calculated using the model average method (MA). FIGURE S10. Synteny dot-plot obtained by comparison between Amoebophrya A25 and AT5 genomes. FIGURE S11. Synteny dot-plot obtained by comparison between Amoebophrya A120 and AT5 genomes. FIGURE S12. Intron length distribution. FIGURE S13. GC content distribution. FIGURE S14. Multiple alignments of U2 snRNAs. FIGURE S15. Multiple alignments of U4 snRNAs. FIGURE S16. Multiple alignments of U5 snRNAs. FIGURE S17. Multiple alignments of U6 snRNAs. FIGURE S18. Secondary structure of Amoebophrya snRNA. FIGURE S19. Example of introner elements (IEs) in Amoebophrya. FIGURE S20. Distribution the direct repeats with size ranging between 3 and 8 nucleotides in A25. FIGURE S21. Distribution of the direct repeats with size ranging between 3 and 8 nucleotides in A120. FIGURE S22. Composition of direct repeats in introners elements. The diversity in composition of the three (a, b, c) most abundant of direct repeats in introner elements in A25 (up) and A120 (down). FIGURE S23. Terminal inverted repeat locations around the splicing sites in A25 and A120. The position of inverted repeats according to the location of the splice sites in A25 and A120. Left, the inverted repeats of A120 are located at 1–5 the nucleotides upstream and downstream of the splice sites. Right, the inverted repeats of A25 are located at the 1–6 nucleotides in upstream and downstream of the splice sites. FIGURE S24. The flowchart for the in silico search of introner elements. FIGURE S25. Hierarchical clustering analysis (pairwise similarity and OrthoMCL) of all intron families and of the inverted repeats in A25 and A120. FIGURE S26. Percentage of genes with assigned functions in relation with introns composition. FIGURE S27. Difference in the proportion of IEs-containing-genes compared to their KEGG assignment in A25 and A120. FIGURE S28. Distribution of conserved introns. TABLE S1. RCC number, date and site of isolation of strains considered in this study. TABLE S2. Metrics of Nanopore runs for the two Amoebophrya strains. TABLE S3. Search for pathways involved in plastidial functions that are entirely independent of plastid-encoded gene content. TABLE S4. Number of the different types of introns identified in A25 and A120 genomes. TABLE S5. Search for RNA editing in A25 and A120 introns. TABLE S6. Putative Amoebophrya A25 and A120 snRNP homologs. TABLE S7. Classification into families of non-canonical introns in A25 and A120. TABLE S8. RNAseq read assembly statistics of Amoebophrya A25 and A120 corresponding samples from the different time of infection and to the freeliving stage (dinospore only). TABLE S9. Total number of contigs belonging to samples from different stages of infection and the proportion of them that were aligned against the genomes of both Amoebophrya A25 and A120. ND corresponds to “not determined” when no measurement was done. TABLE S10. Metabolic pathway screened in A25 and A120 proteomes.This research was funded by the ANR (Agence Nationale de la Recherche) Grant ANR-14-CE02-0007 HAPAR, the CEA and the Région Bretagne (RC doctoral grant ARED PARASITE 9450 and EK postdoctoral grant SAD HAPAR 9229), and the CNRS (X-life SEAgOInG).http://www.mdpi.com/journal/biomedicinesam2022BiochemistryGeneticsMicrobiology and Plant Patholog

UPSpace at the University of Pretoria

The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics.

Author: Alegria C.
Alioto Tyler
Alves Paulo C
Amorim Isabel R.
AURY Jean-Marc
Backstrom Niclas
Baldrian Petr
Baltrunaite Laima
Barta Endre
Bed'Hom Bertrand
Belser Caroline
Bergsten Johannes
Bertrand Laurie
Bilandzija Helena
Binzer-Panchal Mahesh
Bista Iliana
Blaxter Mark
Borges Dias Guilherme
Borges Paulo A. V.
Bosse Mirte
Brown Tom
Bruggmann Rémy
Buena-Atienza Elena
Burgin Josephine
Buzan Elena
Böhne Astrid
C. Cruaud
Campo Javier Del
Casadei Nicolas Lougi Pascal
CHIARA MATTEO
Chozas Sergio
CIOFI CLAUDIO
Cock Mark J.
Crottini Angelica
Cruz Fernando
Dalén Love
Davey Robert P
DE BIASE Alessio
De Panis Diego
De Pooter Tim
Delić Teo
Dennis Alice B
Derks Martijn FL
Diedericks Genevieve
Diroma Maria Angela
Djan Mihajla
Duprat Simone
Eleftheriadi Klara
Escudero Nuria
Fernandes Carlos
Fernández José M
Fernández Rosa
Feulner Philine GD
Flot Jean-François
Formenti Giulio
Forni Giobbe
Fosso Bruno
Fournier Pascal
FOURNIER-CHAMBRILON Christine
Gabaldón Toni
Garg Shilpa
Gissi Carmela
Giupponi Luca
Gonzalez Josefa
Grilo Miguel
Gruening Bjoern
Guiglielmoni Nadège
Gut Marta
Guérin Thomas
Gómez-Garrido Jèssica
Haesler Marcel P
Hahn Christoph
Halpern Balint
Harrison Peter
Heintz Julia
Herron Katie E.
Hindrikson Maris
Howe Kerstin
Hughes Graham
Höglund Jacob
Iannucci Alessio
Istace Benjamin
Jancekovic Franc
Joris Geert
Joye-Dind Sagane
Jónsson Zophonías O
KIRANGWA JOSEPH
Koskimaki Janne J.
Krystufek Boris
Kubacka Justyna
Kuhl Heiner
Kusza Szilvia
Labadie Karine
Lahteenaro Meri
Lantz Henrik
Lavrinienko Anton
Leclere Lucas
Leitão Henrique
Lopes Ricardo Jorge
Madsen Ole
Magdelenat Ghislaine
MAGOGA GIULIA
Manousaki Tereza
Mappes Tapio
Marins Luísa S
Marques João Pedro
Martinez Redondo Gemma I
Maumus Florian
Mazzoni Camila J.
Mc Cartney Ann M.
Megens Hendrik-Jan
Melo-Ferreira José
Mendes Sofia L
Minotto Alice
Montagna Matteo
Moreno João
Morselli Marco
Mosbech Mai-Britt
Moura Monica
Mouton Alice
Musilova Zuzana
Myers Eugene
Nash Will J.
Natali Chiara
Nater Alexander
Nicholson Pamela
Niell Manuel
Nijland Reindert
Noel Benjamin
Norén Karin
Oliveira Pedro H
Olsen Remi-André
Ometto Lino
Ossowski Stephan
Palinauskas Vaidas
Panibe Jerome P
Paupério Joana
Pavlek Martina
Pawłowska Julia
PAYEN Emilie
Pellicer Jaume
Pesole Graziano
Pimenta João
Pippel Martin
Pirttilä Anna Maria
Poulakakis Nikos
Pálsson Snæbjörn
Rajan Jeena
Rego Ruben MC
Resendes Roberto
Resl Philipp
Riesgo Ana
Romeiras Maria M.
Roxo Guilherme
Ruiz-López María José
Rödin-Mörch Patrik
Rüber Lukas
Saarma Urmas
Salces-Ortiz J
Seehausen Ole
Shaw Felix
Silva Luís
Sim-Sim Manuela
Soares André ER
Soler Lucile
Sousa Vitor C
Sousa-Santos C.
Spada Alberto
Stefanovic Milomir
Steger Viktor
Stiller Josefin
Strazisar Mojca
Struck Torsten Hugo H
Stöck Matthias
Sudasinghe Hiranya
Svardal Hannes
Tapanainen Riikka
Tellgren-Roth Christian
Trindade Helena
Tukalenko Yevhen
Urso Ilenia
Vacherie Benoit
Van Belleghem Steven M
Van Oers Kees
Vargas-Chavez Carlos
Velickovic Nevena
Vella Adriana
Vella Noel
Vernesi Cristiano
Vicente Sara
Villa Sara
Vinnere Pettersson Olga
Volckaert Filip AM
Vörös Judit
Waterhouse Robert M
Watts Phillip
Wincker Patrick
Winkler Sylke
Wood Jo
Čiampor Fedor
Publication venue
Publication date: 01/01/2023
Field of study

ABSTRACT: A global genome database of all of Earth’s species diversity could be a treasure trove of scientific discoveries. However, regardless of the major advances in genome sequencing technologies, only a tiny fraction of species have genomic information available. To contribute to a more complete planetary genomic database, scientists and institutions across the world have united under the Earth BioGenome Project (EBP), which plans to sequence and assemble high-quality reference genomes for all ∼1.5 million recognized eukaryotic species through a stepwise phased approach. As the initiative transitions into Phase II, where 150,000 species are to be sequenced in just four years, worldwide participation in the project will be fundamental to success. As the European node of the EBP, the European Reference Genome Atlas (ERGA) seeks to implement a new decentralised, accessible, equitable and inclusive model for producing high-quality reference genomes, which will inform EBP as it scales. To embark on this mission, ERGA launched a Pilot Project to establish a network across Europe to develop and test the first infrastructure of its kind for the coordinated and distributed reference genome production on 98 European eukaryotic species from sample providers across 33 European countries. Here we outline the process and challenges faced during the development of a pilot infrastructure for the production of reference genome resources, and explore the effectiveness of this approach in terms of high-quality reference genome production, considering also equity and inclusion. The outcomes and lessons learned during this pilot provide a solid foundation for ERGA while offering key learnings to other transnational and national genomic resource projects.info:eu-repo/semantics/publishedVersio

Repositório da Universidade dos Açores

Open Repository and Bibliography - Liège

Sequencing and Chromosome-Scale Assembly of Plant Genomes, Brassica rapa as a Use Case

Author: Benjamin Istace
Publication venue: 'MDPI AG'
Publication date: 30/07/2021
Field of study

With the rise of long-read sequencers and long-range technologies, delivering high-quality plant genome assemblies is no longer reserved to large consortia. Not only sequencing techniques, but also computer algorithms have reached a point where the reconstruction of assemblies at the chromosome scale is now feasible at the laboratory scale. Current technologies, in particular long-range technologies, are numerous, and selecting the most promising one for the genome of interest is crucial to obtain optimal results. In this study, we resequenced the genome of the yellow sarson, Brassica rapa cv. Z1, using the Oxford Nanopore PromethION sequencer and assembled the sequenced data using current assemblers. To reconstruct complete chromosomes, we used and compared three long-range scaffolding techniques, optical mapping, Omni-C, and Pore-C sequencing libraries, commercialized by Bionano Genomics, Dovetail Genomics, and Oxford Nanopore Technologies, respectively, or a combination of the three, in order to evaluate the capability of each technology

Multidisciplinary Digital Publishing Institute

Chromosome-scale assemblies using Nanopore long reads and Bionano optical maps

Author: Istace Benjamin
Publication venue: HAL CCSD
Publication date: 01/06/2020
Field of study

International audienc

HAL Evry

Chromosome-scale assemblies using Nanopore long reads and Bionano optical maps

Author: Istace Benjamin
Publication venue: HAL CCSD
Publication date: 01/06/2020
Field of study

International audienc

HAL-CEA

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads

Author: Aury Jean-Marc
Istace Benjamin
Publication venue: Oxford University Press
Publication date: 01/01/2021
Field of study

International audienceSingle-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes

HAL Evry

HAL-CEA

BiSCoT: improving large eukaryotic genome assemblies with optical maps

Author: Aury Jean-Marc
Belser Caroline
Istace Benjamin
Publication venue: PeerJ
Publication date: 01/01/2020
Field of study

International audienceMotivation Long read sequencing and Bionano Genomics optical maps are two techniques that, when used together, make it possible to reconstruct entire chromosome or chromosome arms structure. However, the existing tools are often too conservative and organization of contigs into scaffolds is not always optimal. Results We developed BiSCoT (Bionano SCaffolding COrrection Tool), a tool that post-processes files generated during a Bionano scaffolding in order to produce an assembly of greater contiguity and quality. BiSCoT was tested on a human genome and four publicly available plant genomes sequenced with Nanopore long reads and improved significantly the contiguity and quality of the assemblies. BiSCoT generates a fasta file of the assembly as well as an AGP file which describes the new organization of the input assembly. Availability BiSCoT and improved assemblies are freely available on GitHub at http://www.genoscope.cns.fr/biscot and Pypi at https://pypi.org/project/biscot/

HAL Evry

Directory of Open Access Journals

HAL-CEA

Comparative assessment of long-read error correction software applied to Nanopore RNA-sequencing data

Author: Aury Jean-Marc
Caboche Ségolène
Chikhi Rayan
da Silva Corinne
Istace Benjamin
Lima Leandro
Marchet Camille
Touzet Hélène
Publication venue: 'Oxford University Press (OUP)'
Publication date: 24/06/2019
Field of study

International audienceMotivation: Nanopore long-read sequencing technology offers promising alternatives to high-throughput short read sequencing, especially in the context of RNA-sequencing. However this technology is currently hindered by high error rates in the output data that affect analyses such as the identification of isoforms, exon boundaries, open reading frames, and the creation of gene catalogues. Due to the novelty of such data, computational methods are still actively being developed and options for the error-correction of Nanopore RNA-sequencing long reads remain limited. Results: In this article, we evaluate the extent to which existing long-read DNA error correction methods are capable of correcting cDNA Nanopore reads. We provide an automatic and extensive benchmark tool that not only reports classical error-correction metrics but also the effect of correction on gene families, isoform diversity, bias towards the major isoform, and splice site detection. We find that long read error-correction tools that were originally developed for DNA are also suitable for the correction of Nanopore RNA-sequencing data, especially in terms of increasing base-pair accuracy. Yet investigators should be warned that the correction process perturbs gene family sizes and isoform diversity. This work provides guidelines on which (or whether) error-correction tools should be used, depending on the application type. Benchmarking software: https://gitlab.com/leoisl/LR_EC_analyser Supplementary information: Supplementary data are available at Briefings in Bioinformatics online

INRIA a CCSD electronic archive server

Comparative assessment of long-read error correction software applied to Nanopore RNA-sequencing data

Author: Lima Leandro
Marchet Camille
Caboche Ségolène
Da Silva Corinne
Istace Benjamin
Aury Jean-Marc
Touzet Hélène
Chikhi Rayan
Publication venue: 'Oxford University Press (OUP)'
Publication date: 24/06/2019
Field of study

INRIA a CCSD electronic archive server

HAL-Inserm

HAL-CEA

Servicio de Difusión de la Creación Intelectual